Extended Study of Pitch Shifted Speech by Preserving Tempo: An Experimental Study
S Choudhury, C Singh, M Thakar
frequency domain, pitch shift, speech characteristics
S Choudhury, C Singh, M Thakar. Extended Study of Pitch Shifted Speech by Preserving Tempo: An Experimental Study. The Internet Journal of Forensic Science. 2006 Volume 2 Number 1.
The overall pitch of a recorded speech sample could be subjected to pitch shift techniques available with the advancement in digital technology. Effect on speech characteristics due to time domain pitch shift technique have been undertaken using time warping. Study on the effect of frequency domain pitch shift by preserving tempo has been conducted with the speech exemplars of 15 speakers at a stretch ratio of 90, 95, 105 and 110 as compared to the original speech exemplar. Effect due to frequency domain pitch shift on F1, F2, F3, nasal formant frequencies, duration of word segment and mean period are analyzed with respect to the overall shift in the mean F0. The change in pitch due to stretching is found independent of the position of F1, F2 and F3. However, the change in the values of F1, F2, F3 and mean period for a speaker is linear.
Note: The paper was presented at XVI All India Forensic Science Conference 2004, Hyderabad, India and appeared in the Proceedings.
A change in overall pitch results in a change in the speech characteristics, which makes the forensic expert a challenging task in the process of identifying the speaker [1,2,3,4,5]. Automatic systems for speaker identification based on pitch detection technique suffer from similar problem [6,7,8]. The shift in pitch may be circumstantial or intentional. Recording of speech in a low-grade recorder, recording with off-speed due to low battery or power supply, malfunction of the tape recorder etc. lead to pitch change. Secondly, the difference between standards used for film and for video generates problems when converting from one format to another. Since all the images are displayed, the change of frame rate induces a pitch change on the sound. Another suitable example may be considered as to fit a specified duration of a video footage or speech to a fixed length of time. These are all circumstantial. Effect of change in the playback speed of an analog recorder in authenticity examination has been discussed . In certain situations, factor like tape stretch can also contribute to pitch shift and timing errors, which are significant in contrast to the NAB & DIN specifications as described by McKnight . Advances in technology and processing of audio data digitally by applying different signal processing techniques have contributed a wide number of tools to shape audio data. It has become possible to alter data in a desired manner with the advent of computer-based tools. The methods used are either time domain or frequency domain or time-frequency domain. Time domain uses autocorrelation technique while frequency domain uses phase-vocoder technique based on the concept of analysis, transformation and/ or synthesis applied to the original sound. Time-frequency domain is based on constant bandwidth and modification of phase. The study on the effect of time warping on speech characteristics has been carried out  and its impact on speaker identification has been discussed. An extended study has been conducted considering the speech characteristics due to frequency domain pitch shift technique by preserving tempo.
Methodology & Experimentation
Selection of Speech Material
Text containing vowels and nasals are prepared in Hindi. A total of 15 speakers, both male and female in the age group of 25-45 are selected and asked to read the text. Two utterances of each speaker are recorded in a semiprofessional type analog tape recorder. These samples are digitized at a sampling rate of 22050 using 16-bit quantization in mono mode. The sentence of interest “
Exemplars are prepared by subjecting these samples to a constant stretch ratio of 90, 95, 105 and 110 by preserving tempo. Splicing frequency of 50 Hz and overlapping of 30% is used for stretch ratio of 90, splicing frequency of 49 Hz and overlapping of 29% is used for stretch ratio of 95, splicing frequency of 47 Hz and overlapping of 28% is used for both 105 and 110 stretch ratio. These exemplars are analyzed in Computerized Speech Laboratory (4003B). Mean fundamental frequency (F0); first (F1), second (F2) and third formant (F3) frequencies at a particular location (
Results And Discussion
Fig.-1 shows the first formant frequency (F1), second formant frequency (F2), third formant frequency (F3) at
Variation of F2 and F3 is more than twice from the variation of F1 on changing pitch from stretch ratio of 90% through 110%. Stretching an exemplar with a ratio of 90 or 95 either add periods or reduce the duration of each period in the syllable of a word by using a complex algorithm to increase the overall pitch. The extra periods added to the existing periods as appear from the waveform are the mean of the previous and the following period at the center of the syllable. Similarly, stretch ratios of 105 or 110 either remove periods or elongate the existing the periods of the syllable and thereby lowering the overall pitch. The removal of periods cause a loss in formant information and a shift in the formant is observed. Addition or deletion of periods in the syllable results in a decrease or increase in the silence region respectively, even if the total duration of the exemplar is constant. The introduction or removal of periods takes place in such a way that the mean period decreases linearly for stretching below 100 and increases for stretch ratio higher than 100. In case of time warping, pitch changes by elongating or compressing the whole sample in time but the number of periods in the syllable remains unchanged.
The variation of F1, F2 and F3 at
The change in the formant frequency is equally effective in other regions also. No such noticeable difference is observed in the fricative region
Nasal formant frequencies measured at
Fig.-3 (a) shows the percent variation of F1, F2 and F3 with respect to mean F0 at
Fig.-4 (a) shows the percent variation of F1, F2 & F3 with Mean F0 for stretch ratio of 105 at
The change of overall pitch by preserving tempo affects the higher formant frequencies more than the lower formants with linear change in the measurable speech parameters. The amount of change in the values of F1, F2 & F3