The role of emotions in the perception of natural vs. play-acted dubbing: An approach to angry and sad vocal performances

Naranjo, Beatriz

doi:https://doi.org/10.7202/1088351ar

1. The role of voice quality on the screen

Together with the visuals and the script, voices can play a central part in the experience of watching a film. However, the analysis of the human voice in cinema has often been neglected due to its “slippery” and “elusive” character and still needs and deserves more attention from scholars (Whittaker and Wright 2012). The vocal performance of screen and voice actors may even determine whether a certain film is perceived as aesthetic and enjoyable by the audience since it is able to elicit powerful emotional reactions (Scherer 1995; Schingler 2006). Smith (2007: 164) points out that the human voice can contribute on a large scale to the “texture of meaning” of film, especially in moments of “vocal release” in which actors convey a specific feeling. The affinity between the dubbers’ voices and the personality and physical features of screen actors has even been envisaged as one of the essential types of synchrony in dubbing, called ‘character synchrony’ (see Díaz Cintas and Orero 2010, commenting on Fodor 1969). When a voice does not harmonize with the character’s body language – is “disembodied,” in Shingler’s (2006) terms – it may even ruin the viewers’ enjoyment of the film. The identity of the character in some cases can be strongly built upon vocal peculiarities, such Marlon Brando’s husky voice in The Godfather or Woody Allen’s distinctive stuttering voice. Therefore, when dubbing voices into other languages, both the selection of a suitable voice for the character and consistency with voice choice can be vital for credibility purposes and, ultimately, the financial success of audiovisual products (Bosseaux 2008; 2019). In this regard, Whittaker (2012) explains how the voices of some iconic dubbing actors, such as Constantino Romero in Spain, acquire their own identity on the screen and they can even become a “vocal persona.”

The criteria that define quality of vocal performances may vary depending on the intended result, but research has shown how some acoustic parameters such as pitch, intonation and other vocal effects, (i.e. vocal fry or breathiness) can be involved in emotional expression and perception in play-acted contexts (Scherer 1995; Gobl and Chasaide 2003; Johnstone, van Reekum, et al. 2006; Schingler 2006; Jürgens, Hammerschmidt, et al. 2011; Pietrowicz, Hasegawa-Johnson, et al. 2017; Bosseaux 2019). Thanks to these properties, listeners have the ability to differentiate playacting from natural emotional expressions (Audibert, Aubergé, et al. 2010; Jürgens, Hammerschmidt, et al. 2011, Juslin, Laukka, et al. 2018) as well as identify specific discreet emotions, such as fear, anger, sadness and joy (see Scherer, 1995; Johnstone, van Reekum, et al. 2006). In his study, Herbst (1994) carried out an experiment in which students were exposed to original and dubbed voices with the aim of finding out whether they would be able to tell the difference. His results indicate that differences between the original and the dubbed soundtracks were clearly perceived. The author points out that this would be possible not only due to the type of language used in dubbing, but also to the unnatural intonation patterns that are present in dubbed dialogues as a consequence of being recorded over several takes (1997: 294). Play-acted emotional speech is usually perceived as more stereotypical and overemphasized than spontaneous talk (Jürgens, Hammerschmidt, et al. 2011). In fact, in a study conducted by Juslin, Laukka, et al. (2018), participants, judging emotional valence and intensity, assigned higher scores to posed voices as opposed to natural voices in clips.

While the use of posed expressions in real-life conversation may not gain social acceptance, given that they are perceived as excessively dramatic, it can be acceptable for an audience within the framework of theatre or film. In fact, tolerance toward play-acted voices seems to be also present in dubbed productions. Palencia-Villa (2002) devotes her Ph.D. dissertation to the credibility of characters in Spanish dubbing and concludes that, even though original and dubbed voices may differ in terms of pitch or fundamental frequency, dubbers are still able to preserve the verisimilitude and identity of the screen characters. This may be so among viewers accustomed to dubbed voices who are not likely to judge them negatively. However, literature has not yet been able to fully define the boundaries between naturalness and overacting in play-acted scenarios. Also, it is unclear whether there may be differences in viewers’ degree of tolerance toward overacting, depending on the film genre, the specific emotions and emotional intensity of the scenes. This study seeks to answer some of these questions in the context of dubbing.

2. The role of naturalness in dubbing

Many scholars and professionals in the dubbing industry have been concerned with the issue of naturalness when trying to assess which features define dubbing quality. From the academia, we can highlight contributions made by authoritative voices in the field of Audiovisual Translation, such as Chaume (2004), who included realistic dialogues and avoidance of overacting as essential aspects on his list of quality standards for dubbing. Also, dubbing professionals seem to agree with this premise. During interviews with the Spanish voice actress Vicky Tessio (2020),[1] and the producer Paul Chivers (2020)[2] from London Hampstead Music & Voice Studio, both revealed their preference for natural speech as one of the qualities that make a playacted voice performance successful.

Traditionally associated with linguistic traits of orality, naturalness has been one of the core objects of study in research on dubbing (see Pérez González 2007; Antonini and Chiaro 2009; Romero Fresco 2007; 2009; 2012; Pavesi 2018; Spiteri Miggiani 2019; Sánchez-Mompeán 2017; 2019; 2020). Romero Fresco (2012: 186) defines naturalness as “native-like selection of expressions in a given context.” However, researchers in the field of Audiovisual Translation have gained awareness that naturalness in dubbed productions is actually “prefabricated” (Chaume 2001; Baños Piñero and Chaume 2009) and not comparable to the features of real-life conversation or even screen dialogues in the OV (Original Version). The language of dubbing – also referred to as “dubbese” (Myers 1973; Pavesi 1996) – is characterized then by a certain degree of unnaturalness in terms of the linguistic and translation choices made, as compared to spontaneous talk, and viewers seem not to dislike it, as other relevant voices in the field have pointed out (Antonini and Chiaro 2009; Romero Fresco 2009; 2012). Research reveals that, while viewers are able to identify unnatural traits in dubbing when they are assigned to do so, these traits possibly go unnoticed and tolerated when viewers are immersed in the experience of watching films. This “syndrome of linguistic bipolarity” in Antonini and Chiaro’s (2009: 111) terms can be explained by the phenomenon of suspension of linguistic disbelief (Romero-Fresco 2012) where viewers stop to question the similarities of play-acted voices to real talk the moment they become aware that they are in front of a screen.

Antonini and Chiaro (2009) raise the question of whether acceptance of dubbed language by the target audience could actually be equivalent to quality. In this regard, based on the premise that the lack of naturalness in dubbed dialogues is not necessarily due to technical constraints, Romero Fresco (2009) questions this assumption by arguing that unnaturalness may even impoverish the experience of film-watching. The author also defends the idea that, while fictional dialogues are always somehow straightjacketed, there are ways to attain idiomaticity. On the other hand, Pavesi (2018) argues that naturalness in AVT (audiovisual translation) may not only be envisaged as those choices that most resemble spontaneous conversation, but also the audience’s expectations under the framework of the filmic experience.

Finally, at the other extreme of the continuum, we can refer to the concept of overacting, which has been defined as “crossing the fine line between credibility and parody” (Spiteri Miggiani 2019: 34). Scholars have associated overacting with phony and theatrical voices overladen with emotion (Whitman-Linsen 1992: 47), which may disrupt the suspension of disbelief and may be accompanied by hampered comprehension. Spiteri Miggiani (2019), who provides an exhaustive account of the process of dubbing from an insider perspective, suggests that dialogues may be manipulated to seek a higher degree of naturalness especially in crucial scenes in which the emotional intensity is prioritized.

3. The distinctive identity of dubbed scripts vs. natural talk

Attempts to identify the specificities of dubbed speech in the field of AVT research have often resorted to corpora based on real-life conversation (Marzà and Prats 2018) or audiovisual programs originally shot in the target language (Baños-Piñero and Chaume 2009) as a benchmark against which specific instances found in dubbed language could be tested in terms of frequency and context of use. A review of the literature reveals some degree of convergence in the results attained to date, pointing to some commonalities across those languages from countries with a deep-rooted dubbing tradition, such as Italy and Spain (Antonini and Chiaro 2009; Baños Piñero and Chaume 2009; Pavesi 2018; Romero-Fresco 2007; 2009; 2012).

With dialogues constituting the structural core of most audiovisual scripts, AVT scholars have mainly devoted their attention to orality traits and their pragmatic implications as the basic unit of analysis to assess the extent to which naturalness is either preserved or eluded in dubbed products. In the following sections we provide an account of the formal linguistic features of orality that have been addressed in previous studies in the field.

3.1. Features of interactional orality

During the 2000s, a plethora of publications in the field of AVT addressed the study of different features of orality related to the interactional character of dubbed dialogues, comparing dubbed dialogues with either film dialogues in their OV counterparts or corpora of natural speech. The vast majority of studies focused on the Italian and Spanish dubbed renderings of English phraseological units, discourse markers and terms of address.

Scholars in the field who have approached the study of audiovisual orality in fictional products using a corpus linguistics methodology agree in the formulaic character of dubbed dialogues. Such fixedness seems to be due to the use of calqued syntactical structures such as specific cleft sentences (Pavesi 2016), weak connectors, the use of progressive tense and split structures (Freddi 2008).

A review of previous literature shows how most studies have tackled the analysis of orality features on the screen from the prism of discourse analysis and phraseology, with intensifiers and discourse markers being the most widely studied phenomena. Results obtained in previous research reveals that orality markers such as you know or I mean tend to be less present in Spanish dubbing than in the English original version or domestic Spanish productions (Chaume 2004; Baños 2014). Such omissions, together with the use of vague hedges, have been attributed to achieving brevity and speeding up communication (Chaume 2004; Quaglio 2009). According to the study carried out by Baños (2014), these orality markers also seem to be more stereotypical and unidiomatic due to interference from the source language; however, the author also identifies some orality markers that were not present in the original dialogues as well as other phraseological units, prefixes and shortening phenomena, in what is regarded as an attempt to achieve more credibility.

Romero Fresco (2007; 2009; 2012) analyses the use of intensifiers and discourse markers in colloquial conversation, comparing their frequency of use in three different sources: the dubbed version of a sitcom in Spanish, a sitcom originally produced in Spanish and a corpus of real-life conversation. According to the author’s findings, some translations provided for intensifiers, such as de veras (really in the ST) and transition markers like está bien (fine or okay in the ST), which are recurrently used in dubbed scripts and play-acted contexts despite being virtually absent in real talk, something that can lead to asymmetries between the original and dubbed versions in terms of social distance between the characters. Similarly, Marzà (2016) analyses intensifying traits such as prefixes and suffixes, like súper- and –mente, and other figures of speech, such as hyperbole, and finds a higher frequency of use of intensified segments in audiovisual corpora and a majority of unnatural intensifiers in dubbing. Moreover, according to Romero Fresco (2007), Spanish fillers such as pues or hombre, which are rather frequent in spontaneous conversation, are barely present in dubbed dialogues. In the Italian context of dubbing, Zanotti (2014) also examines general extenders in TV language, which are defined as expressions with vague semantic content such as and all that, and everything. The author observes a reduction of these markers in translation due to omission being the preferred strategy; however, direct translation was also used to a certain extent in the corpus (19% of cases).

Although these discrepancies can be interpreted as a lack of naturalness, authors justify the presence of these unnatural features by referring to the humoristic purposes of the original product (Marzà 2016) as well as to the notion of suspension of linguistic disbelief, by which consumers of dubbed products cease to question the actual authenticity of the language used on the screen (Romero Fresco 2009: 68).

Another feature of orality that has drawn the attention of researchers in the field of AVT is the translation of terms of address in dubbing. Romance languages such as Spanish and Italian use morphology to convey familiarity or social distance between the speakers through the pronouns lei or usted, which do not exist in English. This lack of grammatical markers is compensated in English with terms of address, which is a common feature in casual conversation. On the one hand, authors such as Pavesi (2012) have found shifts between the personal pronouns tu and lei in Italian dubbing which signal differences in the character’s linguistic positioning and the emotional intensity of important narrative events. On the other hand, some researchers in AVT (Antonini and Chiaro 2009; Naranjo 2015) have noted that dubbed corpora in Spanish and Italian often present literal translations of these terms of address, which are not actually frequent in real-life talk, such as amico [friend], figluiolo [son], hermano [bro, brother], chica [girl]. From a broader approach above the word level, Quaglio (2009) examines the inner structure of conversations in the form of turn-taking and identifies some features in Italian dubbing, such as the absence of overlaps and interruptions, the presence of unexpected features (e.g. pragmatic failure for humoristic purposes (Quaglio 2009: 147), as well the co-occurrence of vague linguistic devices (e.g. sort of) and overly elaborated explanations (Quaglio 2009: 148). In natural conversation corpora, however, he detects longer turns due to the fact that narrative segments are followed by single-word utterances such as sure, right or okay.

Finally, moving away from the corpus linguistics methodology, Pérez-González (2007) proposes a systemic functional approach to studying the naturalness of original and dubbed dialogues that allows researchers to analyze extended stretches of interaction from the perspective of interpersonal relationships. In his analysis, the author observes how spontaneous parts of the original dialogues are sometimes “neutralized by the overall artificiality of the interactional dynamics in the target language” (Pérez-González 2007: 34). Examples extracted from his corpus show how the role or portrayal of some characters in the original version (e.g. the dominant character of one of the speakers) is diluted in the dubbed version since the exchange in the target language fails to reproduce the “sequential build-up” (Pérez-González 2007: 31) found in the original.

3.2. Taboo words

Together with the discourse markers and other features of interactional orality, taboo words have also received a lot of attention from scholars. Instances of taboo expressions analyzed in previous works mainly include abusive and derogatory swearing, as well as other terms related to sexuality, crime, death or killing (see Zanotti 2012; Soler-Pardo 2013; Ávila Cabrera 2016).

A review of the existent literature on this trait in the Italian and Spanish dubbing context reveals a general downgrading trend of the original taboo expressions, either by omission or euphemization and authors have associated this trend with translators’ self-censorship with a view to mitigate the impact on the viewers, broaden the target audience and obtain the approval of the board of censors (Ranzato 2009; Zanotti 2012; Soler-Pardo 2013). It is also worth mentioning that some scholars such as Ávila-Cabrera (2016) have detected a shift in translators’ behavior between productions from the 90s and those from the 2000s, with a lower percentage of not technically justified omissions of offensive language in more recently released films.

Some more recent works have attempted a more fine-grained analysis, finding similar results but with some differences among different categories of taboo formulas. Giampieri (2017; 2018) provides an account of the translation of screen insults (such as ‘motherfucker’), as well as expressions with scatological, religious and sexual connotations. The author finds that a large number of the original taboo words containing sexual references were omitted or lessened in her corpus, whereas some religious offences seem to have been replaced with vulgar language.

Another trend detected in more recently released films is a literal translation together with the routinization of taboo expressions consisting of repeatedly using the same expression in the target language as a fixed equivalent for specific ST structures. In this regard, authors (Ávila-Cabrera 2016;Giampieri 2017; 2018) have reported on the fixedness of the expressions used, which reflect the fact that translators do not resort to the wide variety of creative solutions available in the target language.

Finally, some works have also revealed a dissonance between the low presence of taboo expressions in dubbed products compared to real conversations. In their study, Antonini and Chiaro (2009) carried out an experiment in which a group of randomly selected viewers were asked to watch clips extracted from fictional programs dubbed in Italian and then fill in a questionnaire to check their understanding of specific sections and assess the likelihood that certain “turbulent” language-specific features occurred naturally in Italian. Taboo words were one of the features present in their corpus of film clips based on the premise that in Italy, at the time, these taboo words were frequently omitted in dubbed language even when they are a common feature in everyday Italian. Indeed, their results show that taboo words were rated as one of the closest features to spontaneous talk in Italian.

This tendency has even been spotted in the translation between culturally close language pairs such as Italian and Spanish. Zamora (2016) examines the Spanish dubbed renderings of the word cazzo present in original Italian productions, concluding that syntactic calques and euphemization were the most used strategies even though the degree of social tolerance, conflictivity and frequency of use do not correspond to their literal equivalents in Spanish everyday language.

To sum up, studies so far have revealed two essential patterns that seem to generally appear in the translation of conversational language and orality traits in both Italian and Spanish dubbed dialogues, namely: 1. interference from the source language; 2. fixedness or lack of linguistic variety in the renderings proposed. This tendency coincides with the conclusions drawn by Pavesi (2008; 2018) who argues that audiovisual dialogues have undergone a process of routinization over the years, i.e., a tendency across languages by which commonly recurrent ST conversational expressions are consistently translated in the same way in the TT, often through the use of calques and borrowings from the ST.

4. Paralinguistic specificities of dubbed dialogues

While it seems obvious that the oral nature of dubbing would require that attention be drawn to the acoustic level, all the previously cited works only examine specific formal linguistic features (mainly, certain lexical choices), which fail to provide a fully comprehensive account for the notion of naturalness in dubbed voices. In fact, the importance of paralinguistics has become clear with the publication of books that teach professional voice-actors techniques to master both human and animal sound effects in animation, such as stammers, laugh, cries and snorts (see, for example, Berry 1973; Wright and Lallo 2013).

With the aim of bridging this gap within the context of the academia, recent research carried out by Sánchez-Mompeán (2017; 2019; 2020) goes beyond the actual script to analyze the paralinguistic elements involved in dubbing. Other authors (Bosseaux 2019) had already pointed out the effects of dubbed voices on the target audience as well as the importance of voice as an artistic choice during the dubbing process; however, Sánchez-Mompeán’s work provides a comprehensive account of the different prosodic features involved in dubbing. By carrying out a comparative study between the intonation patterns of the English original version and the Spanish dubbed version of a sitcom, her results revealed not only that the intonation used in the Spanish version greatly differed from that of spontaneous talk, but also that a large part of the pragmatic and attitudinal nuances expressed through prosodic patterns in the OV in English was lost in the Spanish target text.

Some of the features identified in the author’s corpus which could be defined as distinctive traits of dubbed speech or “dubbitis” include fluctuations in pitch, elongation of sounds, the use of tempos which are different from those that would naturally occur in spontaneous conversation and excessive or exaggerated articulation that can also be perceived as tense on some occasions. Altogether, these traits can give the impression of a sometimes-overacted and sometimes-flat vocal performance (Sánchez Mompeán 2017: 346), which evokes a reading aloud style.

As for the question of why voice actors tend to adopt such a distinctive style in recording studios, the author puts forward several hypotheses to explain the presence of each specific pattern. Bearing in mind that speaking voice professionals working for the radio and film industry are trained to adopt a clear and intelligible speech for the audience, it is no wonder that precise articulation is also a feature adopted by dubbing actors. Indeed, as revealed in a personal interview with professional voice actress Vicky Tessio, many of them have started or even still combine their acting career with broadcasting, TV or the theater.

Another argument that, according to Sánchez-Mompeán (2017), would explain the similarities of dubbed talk with reading aloud is the lack of time to memorize the script before the recording sessions in the studio. Unlike screen actors who need to rely on their memory to perform and may only resort to prompters who provide the opening words in case they forget their lines, dubbing actors are not seen by the audience and can still hold the written text in their hands (Sánchez-Mompeán 2017: 29). Therefore, dubbing may resemble more a dramatized reading than screen acting (Sánchez-Mompeán 2017: 35).

The technical constraints of the dubbing practice constitute the third significant factor pointed out by the author to justify some of its paralinguistic features. Synchronization with the talking speed and the specific position of the mouth while the screen character speaks can also lead dubbing actors to sometimes extend the duration of certain sounds, slow down or accelerate the tempo in order to make their lines fit the visuals.

Finally, Spiteri Miggiani (2019) suggests, as another potential influential source, that there is interference with the original auditory stimuli. This scholar argues that the practice of listening to the dialogues in the OV may lead dubbing actors to try to imitate what they hear in an effort to be faithful to the character’s identity, thus leading to unnatural renderings in terms of volume, pitch or intonation. Therefore, the author proposes that voice actors, to attain more naturalness, should probably try to detach themselves from these auditory stimuli.

In the following sections, we offer the results obtained in a quasi-experimental study in which a group of Spanish viewers were asked about their perception and liking of a rather natural and a rather play-acted dubbing style in scenes portraying different emotions.

5. Objective and research questions

The main aim of this study was to detect potential differences in viewers’ perception and preference when watching two different dubbing styles (natural vs. play-acted). For this purpose, the following main research questions were posed:

Do viewers distinguish between natural and play-acted dubbed voices?
Do viewers prefer natural over a play-acted dubbed voices?

To address these questions, an in-situ study was carried out in which two groups of participants were instructed to watch two dubbed versions of the same clip featuring different dubbing styles (natural vs. play-acted). Participants were then asked about perceived differences between the two dubbed versions as well as their dubbing preferences. The specific details about the study are set out in the following sections.

6. Methods

6.1. Participants

59 students (22 male and 37 female) with a mean age of 19.14 participated in this study. All participants were in their first year at university and were not pursuing any degree related to translation or foreign languages. Two main criteria were considered for the sampling. Firstly, the participants had to belong to the same generation of viewers, which guaranteed, to a certain extent, that they had had similar viewing experiences and expectations of dubbed products. Secondly, they must not be pursuing a degree in translation or foreign languages so as to ensure that they did not have specific knowledge or training in audiovisual translation, which would probably avoid excessive alertness or strong judgments about the quality of dubbing triggered by their knowledge and training in translation. Participants were randomly distributed into two groups: Group A with 9 males and 21 females and Group B with 13 males and 16 females. Most participants indicated having at least a daily or weekly habit of watching dubbed audiovisual products

6.2. Materials

Two scenes representing different emotional situations (anger and sadness) were chosen from the film Like Crazy, originally shot in English (see more details in Appendix 1). For each scene, two versions (natural and play-acted) were translated into Spanish by a professional translator and dubbed by junior voice actors-in-training. Both scripts in Spanish differed slightly in the pragmatic and discursive choices, while their semantic content remained intact (see Appendix 2). We drew on previous literature to design materials that adhered as much as possible to the traits observed in dubbese vs. spontaneous talk. In this regard, in the natural version, we tried to detach from the routinization tendency detected in previous works by introducing a variety of discourse markers, which would reflect the pragmatic attitudes of the characters in each scene and we instructed the voice actors to avoid a dramatized ‘reading-for-acting’ style when speaking the lines. Conversational Spanish markers, such as en plan, pues, venga and es que were introduced whenever the lip-synchronization restrictions allowed for it. In the scene representative of anger, we also replaced the empathic articulation present in some imperatives of the OV to convey higher levels of emotional arousal with the Spanish intensifier que (e.g. ¡que me mires y me lo digas! vs. ¡mírame y dímelo!), as this is a more common strategy in this type of communicative exchanges in Spanish. To help actors detach from their play-acted routines, we asked them to imagine how they would talk and react to what the other character was saying if they were going through the same situation in real life.

On the other hand, the translation for the play-acted version was closer to the original and tried to preserve a dubbese style with calqued formulaic routines and the actors were provided with instructions to ensure that they intentionally included some vocal and paralinguistic traits identified in play-acted and dubbed speech by previous studies, such as vocal fry, breathiness, fluctuation of pitch, exaggerated articulation and elongantion of sounds, especially in inter-word diphthongs where the visuals allowed for it. The style of dubbed scenes in productions traditionally broadcast during the weekends by the Spanish media corporation Atresmedia in its Multicine section were also used as a benchmark to guide voice-actors as clear examples of some of these features.

6.3. Procedures

Participants in Group A watched the two dubbed versions of the scene representing anger (titled ‘FIGHT’) and participants in Group B watched the two versions of the scene representing sadness (titled ‘PHONE CALL’). Participants in Group A watched the play-acted version first, followed by the natural version, while the reverse order was applied for Group B to counter-balance any potential order effects. Afterwards, participants were asked to complete an ad hoc questionnaire of seven questions to determine whether they noticed any difference between the two versions in terms of naturalness as well as to find out about their preference for one style over the other. The first questions were aimed at determining whether participants were really able to detect the differences without conditioning them beforehand with any sort of clues as to where the difference was to be found. It was not until later in the questionnaire that they were specifically instructed to assess the naturalness of the versions. It is important to point out that we provided a specific definition of “naturalness” in the questionnaire as “more similar to Spanish everyday talk,” so that the concept was not confusing for our participants.

The experiment was conducted in situ in a university classroom. Both the video clips and the questionnaires were provided through a GoogleForm and participants’ answers were collected online from participants’ own laptops or tablets. Participants also used their own earphones.

6.4. Analysis

Both quantitative and qualitative analyses were carried out with the collected data. Descriptive and inferential statistical analyses were conducted to find out about perceived naturalness and preferences for the two dubbed versions. For this purpose, within-group comparisons for each version of each scene were performed by running the non-parametric test Mann-Whitney, since the samples did not comply with the required criteria of normality and homogeneity of variance. For inferential statistics, a score of 0 or 1 point was assigned depending on whether participants chose the analyzed scene as the most natural/preferred or not, respectively, as they had to opt for one or the other based on its naturalness and their preference. Percentages were also calculated for the occurrences of tags or descriptors provided by participants when they were asked to indicate specific differences they perceived between both versions (e.g. stiff, realistic, emotional).

7. Results

7.1. Perceived overall differences between versions

Both versions were perceived by participants as being different from each other. A total of 100% and 86.2% of participants in Groups A and B respectively reported having noticed a difference. Participants who indicated having perceived a difference were then asked to use their own tags or descriptors to explain how the two versions were different. They were to formulate their answers following the scheme “Scene 1 is more/less + adj. + than 2.” In Tables 1 and 2 we collected and sorted by categories all the descriptors provided by participants in both groups for both the natural (marked with a ‘N’) and the play-acted (marked with a ‘P’) versions. The frequency with which these descriptions appear in participants’ answers is also provided in the tables.

Two broad categories were considered to classify participants’ descriptors: dubbing quality and emotional intensity. In the first case, as displayed in the section “Poor Dubbing Quality” of Table 1, no negative tags (0%) were found when participants assessed the natural version of the scene (‘FIGHT_N’) in terms of the quality of dubbing; however, 28.6% of participants attributed negative tags to the play-acted version (‘FIGHT_P’), mainly highlighting that it was indeed overplayed and stiff or forced.

On the other hand, positive tags for the natural scene outweighed those for the play-acted scene (52.9% vs. 23.8%) with most participants labeling it as more natural and realistic. Nonetheless, 23.8% also perceived the play-acted version as having some positive characteristics (e.g. understandable, better dubbed, professional).

Concerning emotional intensity, descriptors were classified into two subcategories: high and low intensity, depending on their semantic connotations. As Table 1 shows, a higher frequency (41.18% vs. 19%) of high-intensity descriptors, such as colloquial and vulgar, were used by participants to refer to characteristics of the natural scene in comparison to the play-acted scene. As for the low-intensity descriptors, they were more frequently found in participants’ answers associated with the play-acted version than the natural version (28.6% vs. 5.88%). Therefore, from these results we can infer that the natural scene was perceived as having a higher emotional intensity, allegedly due to its colloquial and even vulgar character, whereas the play-acted scene was perceived as being more literal, external, cold, robotic, formal or slow.

Table 1

**Descriptors provided by participants in Group A for the scene ‘FIGHT’**

Table 2 below displays the results obtained for the scene PHONE CALL. The same categories of dubbing quality and emotional intensity were used for this analysis. Firstly, contrarily to the scene FIGHT, participants in this group did not use any descriptors (0%) associated with poor quality for the play-acted scene, whereas in the natural scene we find that 26.32% of the answers contain negative tags such as overacted. Secondly, a higher frequency of positive tags related to the naturalness of the scene (natural) was found associated with the play-acted version (85.71%) compared to the natural version (47.37%). Finally, while the natural scene was more frequently assessed with high-intensity descriptors, such as colloquial or dramatic (19%) than with low intensity descriptors, such as formal (5.26%), its play-acted counterpart received little attention in terms of emotional intensity with 0% cases of high-intensity descriptors and only one low-intensity tag (dry).

Table 2

**Descriptors provided by participants in Group A for the scene ‘PHONE CALL’**

7.2. Statistical differences in perception of naturalness

In this section, we carried out inferential statistics that allow a comparison of the level of naturalness perceived by participants between the play-acted and the natural version for each scene. Table 3 shows descriptive data: mean (Mn), median (Md) and standard deviation (SD) for each data sample, as well as the results of the Sapiro-Wilk normality test (S-W test), Levene’s homogeneity of variance test and the Mann-Whitney test. The p-value obtained after carrying out this last test indicates whether the differences between the two versions (natural vs. play-acted) are statistically significant (<0.05) or not significant (>0.05).

As displayed in Table 3 below, very significant differences between the samples of both versions in Group A (p=0.00) reveal that the version of the scene ‘FIGHT’ which was originally recorded with a more natural style was, in fact, perceived as more natural by participants than the play-acted version. However, in Group B no significant differences were found between both versions for the scene ‘PHONE CALL’ (p=0.09). For this scene, a higher percentage of participants actually perceived the play-acted version as being more natural.

Table 3

**Comparison of means of perceived naturalness between play-acted and natural versions**

7.3. Specific differentiating traits

When participants were asked more precisely about the specific traits that made them able to tell the differences between both versions, three different aspects emerged: 1) differences in the language used in the script, 2) paralinguistic features and 3) some specific moments of the scene. As Table 4 shows, ‘FIGHT’ differences are perceived mainly in terms of the type of language used, with most participants noticing specific changes which made the natural version more colloquial. Differences in paralinguistics were also perceived mostly in terms of intonation and the speed with which participants talked. Also, almost 38% of participants referred to the moment in which participants started shouting as the point in which differences were more noticeable. On the other hand, in ‘PHONE CALL,’ paralinguistic changes (especially prosodic) were the most clearly perceived as differentiating elements between the two versions, with 70% of participants reporting changes at this level. 26% of participants also detected changes in the type of language used and only a small percentage (3.7%) mentioned a specific point in the scene.

Table 4

**Differentiating elements perceived between the play-acted and natural versions of both scenes**

7.4. General dubbing preferences

When asked about their preference for one dubbing style over the other, statistical analyses only revealed significant differences for the scene ‘FIGHT.’ As illustrated in Table 5, whereas in ‘FIGHT’ there is an preference for the natural version over the play-acted version with a p-value of 0.000, in ‘PHONE CALL’ no statistical differences were found, with a p-value >0.05. The percentages show that approximately half of participants preferred the natural version (44.83%) and the other half (55.17%) preferred the other.

Table 5

**Comparison of means of dubbing preferences (natural vs. play-acted) for ‘FIGHT’ and ‘PHONE CALL’**

Finally, participants were asked to justify such preferences in an open-ended question. In both scenes, the quality of dubbing in terms of naturalness was the most frequently reported reason for preference of one version over the other by arguing that it was “more realistic,” “more similar to everyday speech” or “less acted/theatrical.”

8. Discussion and conclusions

In this study we attempted to determine viewers’ preference for a more natural or play-acted dubbing style and whether the emotional content of the scene could influence such preference. While some voices (Pavesi, 2016) argue that naturalness in dubbing may not necessarily be envisaged as being 100% aligned with the features of spontaneous real-talk, our purpose was to record two radically different versions of the same scene without altering the semantics or pragmatic nuances. One of them can be defined as a play-acted version, whereas its natural counterpart deliberately attempts to violate the expectations of viewers accustomed to dubbing by adopting the paralinguistic attitudes of a real native Spanish speaker in the situations portrayed.

The first question put to our participants after watching the scenes was to indicate whether they had perceived any differences between versions 1 and 2 and, in case of a positive answer, provide an evaluative word that summarized in which specific attributes they differed. Even though no references to the (un)naturalness of the scenes were provided in this question so as to avoid conditioning the viewers, the answers collected revealed that this was actually the main difference perceived. Most of the labels provided by participants, such as overacted, forced, artificial, real, natural or authentic refer to the degree of naturalness.

Later, the aspect of naturalness was deliberately introduced in the questionnaire as participants were asked to compare the two versions and say which one sounded to them as more natural. According to our data, differences between the two dubbed versions in terms of naturalness have been only perceptible in the scene ‘FIGHT,’ representative of anger, and not in the scene ‘PHONE CALL,’ representative of sadness. Interestingly, such differences were accurately perceived for the scene ‘FIGHT,’ with 97% of participants tagging as more natural and realistic the version purposefully dubbed as such. As for the scene ‘PHONE CALL,’ a higher percentage of participants (48% vs. 27%) described the play-acted version as more natural than the natural one.

In a similar way, preferences for natural dubbing vs. dubbitis were more salient for the anger scene than for the sad scene, in which a higher number of participants seem to have liked the play-acted style more. It is possible that, as Antonini and Chiaro (2009) suggest about Italian viewers, Spanish spectators have also developed some degree of tolerance towards dubbese and unnaturalness of speech when they are exposed to foreign audiovisual products.

Another possible explanation is that the sad scene was perceived as being emotionally more dramatic than the anger scene, i.e., that the sad scene contained a higher emotional load, inasmuch as characters sob and cry at a certain point in their conversation and this is also accompanied by emotional background music. This may have led participants to assess the play-acted version as being more realistic. Actually, it could be argued that the emotional intensity in the anger scene is similar to the sad scene, as viewers can also see how tension escalates throughout the scene to the point that participants loudly shout at each other; however, previous literature on film and emotions suggest that fictional sadness is usually more powerful and easier to elicit than other emotions (see, for example, Oliver 1993; Keen 2007; Wassiliwizky, Wagner et al. 2015). Also, other studies in Translation Process Research suggest that the presence of sad narratives and background music may induce higher levels of psychological engagement and empathy with fictional characters (Naranjo 2018; 2019). Emotional contagion or empathy with characters through fictional sadness in this case may have influenced viewers’ dubbing preferences.

Another factor that may have influenced perception and preferences is the predominant sex in the analyzed samples. The group of participants who watched ‘FIGHT’ contained a higher number of females than the group watching ‘PHONE CALL’ (70% vs. 55%). In the academic literature on voice and emotion, there is evidence that researchers prefer to select women for their assessment panels (see Sobin and Murray 1999) since they allegedly have a better ability to distinguish voice nuances. This could also explain why the naturalness in the anger scene was perceived more accurately.

When asked about the specific aspects of the scene where they noticed differences, some divergences were also found between both scenes. For the sad scene, paralinguistic traits (e.g. intonation, rhythm) were more frequently mentioned than script as the main differentiating trait (70.4% vs. 25.9%), whereas the opposite tendency was found for the scene portraying anger, with a higher percentage of occurrences for the script (41.4%) than paralinguistics (20.7%). Some may argue that these divergences may be due to the difficulties that viewers have discerning what is natural and what is not when they are in front of the screen. In fact, Spiteri Miggiani (2019: 37) argues that spectators would be puzzled if asked whether they think that the intonation of a given dubbed product is natural or not due to suspension of disbelief: “Just as dubbese is unconsciously accepted by the audience, so is intonation, as long as it falls within credible parameters, beyond which, one would be left with a parodic effect.”

However, the fact that they have spontaneously mentioned some elements, such as intonation and rhythm, when asked which specific aspects of the speech were perceived as different, suggests some degree of paralinguistic awareness. This could have been the case here due to the play-acted version being perceived as somehow overacted, and therefore, beyond credible parameters, as Spiteri Miggiani points out, but the divergences found in the percentages for each scene also suggest that the emotional tone of the scenes may also have a certain influence in terms of perception.

To sum up, the differences perceived in terms of naturalness between the natural and play-acted version in this study were perceptible in the anger scene but not in the sad scene. However, participants preferred a more natural style for the anger scene and a rather play-acted style for the sad scene. However, it should be acknowledged that the methodological design followed in this study does not allow for comparisons between the two scenes ‘PHONE CALL’ and ‘FIGHT’ since each group was instructed to watch only the two versions (natural and play-acted) of the same scene. Therefore, the previous remarks comparing the perceived effects of the two scenes should be interpreted with caution given their hypothetical nature. Further studies would be necessary to actually find out whether specific emotions and /or the level of emotional intensity portrayed in the scene might play a role in both the perception and preference of dubbing naturalness vs. dubbitis.

This could have implications for the professional practice of dubbing. Awareness of spectators’ preference in scenes with different emotional nuances could be taken into account when deciding whether to adopt a more natural or play-acted style in the studio. Further studies including a more varied target audience, other emotions portrayed in the scenes and different audiovisual genres would be necessary to determine whether specific acoustic and vocal features can be potential predictors of perceived naturalness and preferences in dubbed speech.

The role of emotions in the perception of natural vs. play-acted dubbing: An approach to angry and sad vocal performances

Abstract

Résumé

Resumen

1. The role of voice quality on the screen

2. The role of naturalness in dubbing