ProPer_Projekt

A workflow for acoustic analysis of speech prosody based on continuous measurements of periodic energy and F0 (requires Praat and R). ProPer analyses cover a wide range of prosodic phenomena, including pitch contours, prominence, speech-rate and syllabic structure. ProPer provides rich visual representations and quantification procedures for prosody in speech.

View the Project on GitHub finkelbert/ProPer_Projekt

Aviad Albert, Francesco Cangemi, T. Mark Ellison & Martine Grice
IƒL - Phonetik, University of Cologne


Instructions for the ProPer workflow

(last update: 9 July 2022)

0. Before you begin

This workflow requires Praat, R and RStudio.

File names

Audio

TextGrids

Managing and using the workflow files


1. ProPer pre-preparation: Acoustics-to-Praat

Data extraction from Praat (Praat script)

Copy the Praat script from 1) ProPer pre-preparation (Acoustics-to-Praat).praat into a Praat script window (or double-click the file to open directly in a Praat script window). You should keep the files and folders structure of the original workflow (otherwise, specify the paths to relevant folders directly in the Praat script or in the prompted script’s form). Also, make sure that your audio files are in the ‘audio/’ directory (likewise, if TextGrids are provided, make sure they are in the ‘praat_data/textgrids/’ directory).

We use the Pitch object in Praat to extract the periodic fraction of the signal from the strength values that are associated with each “pitch candidate” in Praat. The strength scale in the pitch object (from 0 to 1) reflects the extent to which the acoustic signal is similar to itself across selected time points in the autocorrelation function. This similarity characterizes periodic signals, but it is not informative w.r.t. the amplitude of the signal. We need to multiply the periodic fraction by the total power, which we derive from the intensity tier to calculate the periodic power.

The Praat script is based on mausmooth (Cangemi & Albert 2016), prompting a grouped view of the audio and pitch objects of each item in the list, allowing the user to correct pitch candidates in the pitch object (e.g. fix octave jumps) before the pitch object and the smoothed pitch tier are saved. This behavior can be switched off in the form by deselecting “inspect” (the pitch objects and tiers will be automatically created and saved).

To keep things consistent, the parameters that determine Praat’s intensity and pitch candidates analysis are “hard-coded” into the script, i.e. their values are given in constant numbers and they don’t show up in the adjustable form. The parameters that appear in the form can only change Praat’s F0 path finding algorithm, which influences Praat’s choice of F0 among the given candidates. These can be freely adjusted to optimize F0 detection without affecting the calculation of periodic power.

2) ProPer preparation: Praat-to-R

Import Praat data into R tables (create raw_df.csv)

The R codes in 2) ProPer preparation (Praat-to-R).Rmd use the rPraat package to directly read Praat’s object files and collect all selected parameters into an R data frame with all the relevant raw data from Praat (raw_df).

Note how ‘filename’ is extracted in each of the data frames (e.g. intensity_df, f0_smooth_df etc.), and see how ‘speaker’ is further extracted from the file names in fullTime_df. You should be able to design the code to extract any other file name variable according to your needs, using this example.

Note that these codes allow for optional data from Praat’s TextGrids that the user is encouraged to create separately and place in the “praat_data/textgrids” subdirectory. The current settings are designed to read two interval tiers, “Syllable” and “Word”, that demarcate units of these sizes with boundaries and text annotations.

3) ProPer visualization: Periograms

Prepare the main data table (create main_df.csv)

The codes in 3) ProPer visualization (Periograms).Rmd are designed to calculate and shape the periodic energy curve from the periodic power vector after the application of log-transform and smoothing functions. This is enough to achieve the first goal: rich 3-dimensional visualization of pitch contours, a.k.a. Periograms. Periograms show the F0 trajectory whereby time is on the x-axis and frequency is on the y-axis—as in most common practices—while also reflecting the strength of the perceived pitch contour continuously in terms of the thickness and darkness of the F0 curve (see Albert et al. 2018, 2019).

The first part of ProPer visualization (Periograms) presents adjustable presets that summarize the important variables for the periodic energy adjustment phase:

Note that we create 4 flavors of smoothing for the periodic energy curve. We use low-pass filters of 20, 12, 8 and 5 Hz. The 20 Hz filtering (the least smooth variant) targets segmental-size intervals (as short as 50 ms), while the 5 Hz filtering (the smoothest variant) is targeting syllable-size intervals (at 200 ms). We provide 2 more smoothing stages in between those two ends of a spectrum, with 8 and 12 Hz low-pass filters (corresponding to intervals of 125 and 83 ms sizes respectively).1

The ggplot sections run in a loop, creating plots for each individual audio file. The plots are saved in pdf format under ‘plots/’, and they are in print quality. Feel free to adjust the look (colors, fonts, etc.). These plots are also crucial to inspecting the data and adjusting the parameters (see above) to achieve the optimal periodic energy curves before saving the main_df data frame as a .csv file.

4) ProPer analyses: Synchrony etc.

Detect intervals and perform computations on the data (create comp_df.csv)

The codes in 4) ProPer analyses (Synchrony etc.).Rmd are designed to extract ProPer quantifiable data on F0 shape (Synchrony and ∆F0), prosodic prominence (Mass) and local Speech rate (see Cangemi et al. 2019 on Synchrony). We start with a boundary detector to locate the intervals of interest. Then, a suite of functions are mapped to the syllabic intervals to calculate the different metrics. Finally, dense plots with superimposed data are being produced for presentation and verification before saving the comp_df data frame as a .csv file.

The automatic boundary detector is designed to locate local minima in the periodic energy curve while also taking into account information from the optional Syllable tier in a corresponding TextGrid. Manual segmentation can guide the automatic detector and help in targeting specific syllables of interest and is therefore highly recommended for ProPer analyses.

The first part of ProPer analyses (Synchrony etc.) presents adjustable presets that summarize the important variables for the boundary detection algorithm:

The following two variables depend on the previous setting: if useManual = TRUE you should adjust further using autoMan only, and if useManual = FALSE you should only adjust further using averageSyll.

The following is a brief description of the calculations we perform:

5) ProPer scores: aggregated data

Allocate ProPer values to manually segmented intervals, for data aggregation and stats (create scores_df.csv)

The 5th script suggests a method to allocate the ProPer values to selected syllables, effectively reducing the table to a single row per syllable. This is useful in order to aggregate the various ProPer metrics to be presented via descriptive statistics and analyzed with inferential statistics.

ProPer metrics are measured within a periodic energy mass that has a center (CoM). We allocate these ProPer values to the TextGrid-based syllabic intervals that include their center (i.e. The Textrid interval within which CoM is found). In the cases when there are multiple CoMs in a single interval, the values of the strongest mass are chosen.


References

Also: check out the list of studies using ProPer.

Albert, Aviad, Francesco Cangemi & Martine Grice. 2018. Using periodic energy to enrich acoustic representations of pitch in speech: A demonstration. In Proceedings of the 9th International Conference on Speech Prosody. Poznań, Poland. link

Albert, Aviad, Francesco Cangemi & Martine Grice. 2019. Can you draw me a question? Winning presentation at the Prosody Visualization Challenge 2. ICPhS, Melbourne, Australia. link

Barnes, Jonathan, Nanette Veilleux, Alejna Brugos, and Stefanie Shattuck-Hufnagel. 2012. Tonal center of gravity: A global approach to tonal implementation in a level-based intonational phonology. Laboratory Phonology 3 (2): 337-383.

Cangemi, Francesco & Aviad Albert. 2016. mausmooth: Eyeballing made easy. Poster presentation at the 7th conference on Tone and Intonation in Europe (TIE). Canterbury, UK.

Cangemi, Francesco, Aviad Albert & Martine Grice. 2019. Modelling intonation: Beyond segments and tonal targets. In Proceedings of the International Congress of Phonetic Sciences. Melbourne, Australia. link

Cook, Perry R. 2001. Pitch, periodicity, and noise in the voice. In Music, Cognition, and Computerized Sound: An Introduction to Psychoacoustics. Ed. Perry R Cook. Cambridge, Mass.: MIT Press.

  1. The code in this script also interpolates and smooths the F0 contour from Praat’s pitch tier. We smooth the final F0 contours with a 6 Hz low-pass filter (166.7 ms intervals), to retain the affinity to syllable-size intervals and given that 6 Hz is closest to naturally occurring vibrato in singing voices (Cook 2001).