Thursday, October 6, 2022
HomePediatrics DentistryRelationship between objective measures of hearing discrimination elicited by non-linguistic stimuli and...

Relationship between objective measures of hearing discrimination elicited by non-linguistic stimuli and speech perception in adults


Twenty-three adult listeners who were native speakers of English participated in this study, comprising 13 with normal hearing (NH) and 10 with hearing impairment (HI). The NH group comprised 10 females and 3 males aged between 19 and 55 years (mean = 27, SD = 9.6) who had hearing thresholds within the normal range in both ears (averaged hearing thresholds at 0.5, 1, 2 and 4 kHz or 4FA < 20 dB HL). The HI group comprised 3 females and 10 males aged between 36 and 81 years (mean = 68.1, SD = 15.0). Their hearing loss ranged from moderate to profound degrees. All HI participants had worn hearing aids for at least 6 months. The study protocol was approved by the Hearing Australia Human Research Ethics committee. All participants provided written informed consent to participate in this study. This study was carried out in accordance with approved guidelines.

Equipment for behavioral testing

All the behavioral tests were conducted in free field in an acoustically treated booth. The stimuli were presented from a loudspeaker (B&W 600 Series) placed 180 cm from the subject, via a sound card (Fireface 800) controlled by a MATLAB (The Mathworks, Natick) graphical user interface running on a computer. All stimuli for the behavioral discrimination test were equalized in level in terms of loudness using a loudness model 30 implemented in Matlab (The Mathworks, Natick). A half-inch Brüel & Kjaer microphone was used for calibration. Stimuli were presented at an overall level of 20 sones, measured at the subject position with the subject absent. Speech perception tests were calibrated using a Matlab user interface and a Brüel & Kjaer Hand-Held Analyzer type 2250. The overall presentation level of speech stimuli was adjusted to be the same as an equivalent continuous noise (Leq) at 75 dB SPL.

Audiological assessment

Otoscopy was performed before pure tone audiometry was performed using insert earphones ER3 and an Interacoustic AD 28 Diagnostic Audiometer31. Hearing thresholds were measured at octave frequencies between 0.25 and 8.0 kHz in each ear using standard procedures32. The four-frequency average (4FA) hearing loss was calculated by computing the mean of the thresholds obtained at 0.5, 1.0, 2.0, and 4.0 kHz in each ear. The hearing aids used by HI participants at their personal settings were measured using an HA2-2cc coupler and a broad-band speech-weighted stimulus using the Aurical Hearing Instrument Test (HIT) box and the OTOsuite software (Natus Medical Inc., Taastrup, Denmark). Average real-ear-to-coupler differences were used to estimate levels of an amplified signal in the real-ear33. The real-ear levels were used in calculations of audibility.

Speech perception

Two different types of speech stimuli were used. Consonant–vowel-consonant (CVC) words were presented in a carrier sentence in quiet, and the Bamford-Kowal-Bench (BKB)-like sentence test material34 were presented in babble noise. Training was provided before testing.

Speech perception in quiet

Recorded stimuli were presented at 65 dB SPL. Each carrier sentence contained two meaningless words; e.g. I saw the [CVC] in the [CVC]; spoken by a female native speaker of Australian English. Subjects were required to repeat the words they heard. The input level was adjusted in 5 dB steps for the first two reversals, after which the step size was reduced to 2 db. The speech reception threshold or SRT at which 50% of the words were correctly repeated was calculated, based on the average of the last 10 reversals. The test stopped when at least 24 stimuli were presented, and when 10 reversals were obtained with the 2-dB step size with a variance of less than 1 dB. The SRT, expressed in dB SPL, was calculated as the mean of the last 10 reversals.

Speech perception in noise

Recorded BKB-like sentences spoken by a male native speaker of Australian English were presented in a babble noise generated using two collocated talkers produced by native Australian English speakers. Subjects were sked to repeat the sentences. The SRT for 50% of the sentences correctly repeated was measured. The target speech was fixed at 65 dB SPL, and the babble noise was adjusted adaptively. The test started at 0 dB SNR, and adjustments were in 5 dB steps for the first two reversals, then in 2-dB steps. The test stopped when the variance was smaller than 1 dB. The SRT, expressed in dB SNR, was calculated as the mean of the last 10 reversals.

Behavioral discrimination test

Phase inversion discrimination (PID) thresholds were measured using spectrally modulated non-linguistic stimuli (SRN). The modulation depth thresholds for which a 180° phase inversion was perceived 50% of the time was measured.

All the stimuli were created by superposition of 4000 pure tone sounds equally distributed on a logarithmic scale. Two bandwidth conditions were included: a low-pass (LP) filtered SRN with a frequency bandwidth ranging from 250 to 1,500 Hz, and a high-pass (HP) filtered SRN with a frequency bandwidth ranging from 2,000 to 11,200 Hz. The spectral modulation starting phase for the ripple was randomly selected from a uniform distribution (0 to π rad). The ripple density was one ripple per octave. The stimuli had a total duration of 500 ms with a 10 ms up and down ramp. The inter-stimulus-interval (ISI) was 500 ms. To measure the PID threshold, a 3-Alternative-Forced-Choice (3AFC) paradigm, with a two-down one-up procedure was used. During the test, one interval had a 180° phase shift (i.e. deviant signal) compared to two other stimuli (i.e. standard signal). The participant had to select the one that was different from the other two. The position of the deviant stimulus in each triplet was randomized across trials. The modulation depth was initially set to 20 dB for the NH group and 25 dB for the HI group. The modulation depth varied automatically in steps of 2 dB for the first two reversals and 0.5 dB for the last ten. The threshold was defined as the average modulation depth of the last 10 reversals. The overall presentation level was 20 sones. All testing was preceded by training.

Electrophysiological tests

The electrophysiological test consisted of the recording of the electroencephalogram (EEG) in response to SRN stimuli with similar characteristics to the one used for behavioral testing. Electrophysiological assessment consisted of recording the CAEPs elicited by the onset of the SRN and the acoustic change complex (ACC) elicited by a transition between two SRN with phase inversion from 0° to 180° (see Fig. 5). The HEARLab system (Frye Electronics, Tigard, OR, USA) was used to record the raw EEG data.

Figure 5
figure 5

The lower panel shows the waveform of the high-pass filtered spectral ripple noise, bandwidth from 2 kHz to 11.2 kHz, with a phase inversion of 180°, one ripple per octave. The top panel shows the mean EEG recording to the stimulus presented at 20 sones. The cortical responses to the onset and the acoustic change complex (ACC) are shown.


All the stimuli were created by concatenating two SRNs that have a 180° phase inversion, at one ripple per octave. The total duration of the stimulus was 3 s with an SRN transition at 2 s after onset. To avoid spectral splatter at the transition, each stimulus was windowed with a 40 ms rise-fall ramp and the stimuli were concatenated with a 20 ms overlap. Two different modulation depths were investigated: 20 and 50 dB for the high-pass filtered condition. A pilot comparison of responses to high-pass filtered stimuli that have either 20 dB modulation depth or 50 dB modulation depth revealed no significant difference in onset or ACC responses. Therefore, 20 dB modulation depth was used for subsequent testing. The HP and LP stimuli used for PID threshold assessments were used. These were presented at normal conversational level (NL) of 20 sones (or around 65 dB SPL) and at a lower input level (LL) of 55 dB SPL. Four stimuli were used: HP20NL (high-pass filtered, 20 dB modulation depth, presented at normal level); HP20LL (high-pass filtered, 20 dB modulation depth, presented at low level); LP20NL (low-pass filtered, 20 dB modulation depth, normal level); and LP20LL (low-pass filtered, 20 dB modulation depth, low level).

Electrophysiological procedure

Testing was conducted in the free field using a single loudspeaker, positioned 100 cm from the subject, at 0° azimuth. All recordings were obtained in a sound-treated booth while the subject sat in a comfortable chair attending to a muted, close-captioned movie of their choice. EEG recording was conducted using four electrodes: one reference electrode placed at FCz and two active electrodes (“M1” on left mastoid and “M2” on right mastoid). The ground electrode was placed on the forehead35. For each stimulus, 120 epochs were recorded. The presentation order of stimuli was randomized across participants using a Latin square design36. Each condition lasted 13 min for a total of an hour and fifteen minutes approximately.

The EEG data were post-processed using a custom-designed Matlab script. Before analog-to-digital conversion, the signal was high-pass filtered at 0.33 Hz by an analog first-order filter. The signal was down-sampled to 1 kHz and low-pass filtered at 30 Hz with a 128-order zero-time delay filter. The raw EEG of each epoch was cut into two segments corresponding to the onset (CAEP induced during the transition from silence to sound stimulation) and the ACC (CAEP induced by the phase inversion of the SRN). The recording window for each response consisted of a 200 ms pre-stimulus interval and 700 ms post-stimulus interval. An artifact rejection algorithm was then applied to exclude any epoch that had a maximum absolute value threshold greater than 160 μV or a mean epoch value greater than 60 μV. A weighted average of the two channels was calculated for each epoch. Then, each recorded epoch was reduced to voltage levels in nine bins, covering the range from 51 to 348 ms, with each bin being 33 ms wide37. The Hotelling’s T2 statistics was used to obtain an objective measure of response detection—a statistically significant p-value means that an electrical activity corresponding to a CAEP was detected to be significantly different from zero or background noise. The z-score was computed from the p-value of the Hotelling’s T2 for both the onset and the ACC responses for each stimulus condition.

Calculation of audibility of stimuli

Audibility was the difference between the signal level and the maximum of hearing threshold and noise level. In HI listeners, real-ear measurements were used to calculate the level of the amplified signal at the ear-drum in the better ear. Audibility was estimated as the maximum value across different bands of one-third octave spectrum on the basis that audibility will most strongly be determined by the frequency region in which it is greatest.

Data analysis

To determine the influence of hearing status (NH or HI) on performance in speech perception, PID, and cortical onset and ACC responses, analysis of variance (ANOVA) with repeated measures was used. To determine the influence of presentation level (normal and low), and frequency (HP or LP) on cortical onset and ACC responses, ANOVA with repeated measures was used. To examine the relationship between PID and cortical responses, and between PID and speech perception, and between cortical responses and speech perception, Pearson’s product moment correlation analyses were carried out. Linear regression analyses using the SRT in noise as the dependent variable, and better-ear audibility together with either behavioral discrimination or ACC-z scores as predictor variables were conducted to estimate the effect of audibility on the relationship between discrimination and speech perception.



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments