Computer-Based Speech Therapy Using Visual Feedback with Focus on Children with Profound Hearing Impairments ANNE-MARIE ÖSTER Doctoral Thesis Stockholm, 2006 Akademisk avhandling som med tillstånd av Kungliga Tekniska Högskolan framlägges till offentlig granskning för avläggande av filosofie doktorsexamen i tal- och musikkommunikation med inriktning på talkommunikation, torsdagen den 15 juni kl. 10.00 i sal F2, Lindstedtsvägen 26, Kungliga Tekniska högskolan, Stockholm. TRITA-CSC-A 2006:10 ISSN-1653-5723 ISRN-KTH/CSC/A—06/10--SE ISBN 91-7178-399-7 KTH School of Computer Science and Communication Department of Speech, Music and Hearing SE-100 44 STOCKHOLM, Sweden © Anne-Marie Öster, juni 2006 Tryck: Universitetsservice US AB Abstract This thesis presents work in the area of computer-based speech therapy using different types of visual feedback to replace the auditory feedback channel. The study includes diagnostic assessment methods prior to therapy, type of therapy design, and type of visual feedback for different users during different stages of therapy for increasing the efficiency. The thesis focuses on individual computerbased speech therapy (CBST) for profoundly hearing-impaired children as well as for computer-assisted pronunciation training (CAPT) for teaching and training the prosody of a second language. Children who are born with a profound hearing loss have no acoustic speech target to imitate and compare their own production with. Therefore, they develop no spontaneous speech but have to learn speech through vision, tactile sensation and, if possible, residual hearing. They have to rely on the limited visibility of phonetic features in learning oral speech and on orosensory-motor control in maintaining speech movements. These children constitute a heterogeneous group needing an individualized speech therapy. This is because their possibilities to communicate with speech depend not only on the amount of hearing, as measured by puretone audiometry, but also on the quality of the hearing sensation and the use the children through training are able to make of their functional hearing for speech. Adult second language learners, on the other hand, have difficulties in perceiving the phonetics and prosody of a second language through audition, not because of a hearing loss but because they are not able to hear new sound contrasts because of interference with their native language. The thesis presents an overview of reports made concerning speech communication and profound hearing impairment such as studies about residual hearing for speech processing, effects of speech input limitations on speech production, interaction between individual deviations and speech intelligibility, and speech assessment methods of phonetic realizations of phonological systems. Finally, through several clinical evaluation studies of three Swedish computer-based therapy systems, concerning functionality, efficiency, types of visual feedback, therapy design, and practical usability for different users, important recommendations are specified for future developments. i ii till lilla Majken iii iv Acknowledgement I would like to thank all people that directly or indirectly have contributed to the completion of this thesis. First of all I want to express my sincere gratitude to David House, my supervisor, for his valuable support and encouragement. His gentleness and brilliant mind have stimulated me in trying to “think right and think clear”. Thanks to his guidance it has been possible for me to see the completeness in my research. I also owe my warmest and deepest gratitude to Björn Granström who always has been available to help and support me when I have needed advice. My warmest thanks to Karl-Erik Spens for his valuable suggestions and comments on this thesis. I owe him a great debt of gratitude, for having introduced me to the Department of Speech Communication and Music Acoustics when I was a young student of phonetics, in the early seventies. Since then we have been officemates and shared many nice discussions, all laughs as well as many unforgettable moments when travelling over the world. I warmly thank Gunnar Fant for having created the stimulating and pleasant atmosphere we all feel. I also want to thank Arne Risberg for his serious commitment, fruitful discussions, and thoughtful care of the former Hearing Technology Group, and Rolf Carlson for his kind attitude. Especially, I want to thank my dearest friend and colleague Eva Agelfors for her deep friendship and for being able to share with her the joy and distress of daily life. I am proud to be a member of several stimulating teams. I thank “Vingänget”: Eva Agelfors, Lennart Andersson, Mats Blomberg, Birgit Cook, Cathrin Dunger, Kjell Elenius, Inger Karlsson, Tina Magnuson, Arne Risberg, Gorda Surjadi, and Maj-Britt Wetterling for all the hilarious moments we shared together, especially in the lovely garden of Mats. Together with “Reparationsfonden”: Eva Agelfors, Birgit Cook, Si Felicetti, Maj-Britt Wetterling, and Gunilla Öhngren Löfberg I have enjoyed luxury and sumptuousness during our memorable trips and journeys. Finally, I want to thank the members of “Fiskmiddagsgruppen”, consisting of Eva Agelfors, Björn Granström, Peter v Nordqvist, and Karl-Erik Spens for delicious food, interesting discussions, and enjoyable company. My sincere thanks to Kjell Elenius for his supportive attitude and for his generous invitations to “Grötgänget” when I didn’t have the time to fix my own lunch. He has been my angel of mercy during the editing of this thesis. A special thank also to Peter Nordqvist for his cheerful manners and expert help with good solutions to all my technical problems and to Inger Karlsson for her friendly drop-ins for a short chat. I express my gratitude to Rebecca Hincks, for proofreading the language of this thesis. Thanks to Anders Askenfelt and the administrative staff Caroline Bergling, Cathrin Dunger, Markku Haapakorpi, and Niclas Horney for their resourcefulness and backup and to all my friends and colleagues at the Department of Speech, Music and Hearing for an inspiring working environment. I also thank my colleagues at the Department of Linguistics, Stockholm University: Björn Lindblom (who led me into the world of phonetics), Bob McAllister, Olle Engstrand, Francisco Lacerda, and Ulla Sundberg for positive collaboration over the years. Finally, I wish to thank the speech therapists of the Manilla School for the Deaf in Stockholm for valuable pedagogical collaboration and all other teachers who always made me welcome to their schools. I am indebted to Margaretha Andolf at the Language Unit at KTH, for giving me the possibility to evaluate SpeechViewer together with some of their L2 learners. Special thanks to Cecilia Melin Weissenborn for her enthusiastic work and joyful collaboration. Enormous thanks to Ewa Bergek-Ulin and Lennart Ulin at Frölunda Data AB for combining business with pleasure and for their help to enthusiastically transform my experiences and results to be useful for speech and hearing-impaired children. My loving thanks are due to my family, Björn, Charlotta, and Johan, who have given me so much love and happiness through the years, to Jonas my favourite son-in-law, to my dear and lovely sisters Birgitta and Monica, my nice three brothers-in-law, and last but not least, my sweet little granddaughter Majken, who has inspired me to great achievements. vi The thesis is based on the following publications Bälter, O., Engwall, O., Öster, A-M., Kjellström, H. (2005). Wizard-ofOz Test of ARTUR - a Computer-Based Speech Training System with Articulation Correction [pdf]. In Proceedings of the Seventh International ACM SIGACCESS Conference on Computers and Accessibility, pp. 36-43, October 9-12, 2005, Baltimore, MD. Engwall, O., Bälter, O., Öster, A-M., Kjellström, H. (2006). Designing the user interface of the computer-based speech training system ARTUR based on early user tests, to appear in Journal of Behavioural and Information Technology. Eriksson, E., Bälter, O., Engwall, O., Öster, A-M., Kjellström, H. (2005). Design Recommendations for a Computer-Based Speech Training System Based on End-User Interviews [pdf]. In Proceedings of the Tenth International Conference on Speech and Computers, pp. 483-486, October 17-19, 2005, Patras, Greece. Granström, B. & Öster, A-M. (1994a). Speech synthesis for hearingimpaired persons - in research, training and communication, Proceedings from 2nd Int. Symposium on Speech and Hearing Sciences, Sept. 24-25 1994, Osaka, Japan, 49-65. Granström, B. & Öster, A-M. (1994b). Speech synthesis for hearingimpaired persons - in research, training and communication, STL/QPSR 2-3/94, 93-111. Öster, A-M. (1985). The Use of a Synthesis-by-Rule-System in a Study of Deaf Speech. STL-QPSR 1/1985, 95-107. Öster, A-M. (1988a). Datorer i talundervisningen – Ett nytt hjälpmedel?, Nordisk Tiskrift för Dövundervisningen, 1, 14-19. Öster, A-M. (1988b). Computer-based speech training, Proceedings of Speech ´88, 7th FASE Symposium, Edingburgh, Book 2, 645-651. Öster, A-M. (1989a). Studies on phonological rules in the speech of the deaf. STL/QPSR 1/89, 159-162. Öster, A-M. (1989b). Applications and experiences of computerbased speech training. STL/QPSR 4/89, 37-44. vii Öster, A-M. (1989c). Applications and experiences of computer-based speech training, Proceedings of European Conference on Speech Communication and Technology (Eurospeech), Paris, 714-717. Öster, A-M. (1990). The effects of prosodic and segmental deviations on intelligibility of deaf speech, STL/QPSR, 79-88. Öster, A-M. (1991). Phonological assessment of eleven prelingually deaf children's consonant production. STL-QPSR 2-3/91, 11-18. Öster, A-M. (1992a). The speech of deaf children – Phonological assessment as a basis for speech training, Thesis work for the Licentiate Philosophy degree in Phonetics, University of Stockholm, Institute of Linguistics. Öster, A-M. (1992b). Phonological assessment of deaf children's existing articulation skills as a basis for speech training. Proceedings of ICSLP 92, october 12-16, Banff, Alberta, Canada, 955-958. Öster, A-M. (1995a). Principles for a complete description of the phonological system of deaf children as a basis for speech training, Profound Deafness and Speech Comm., London: Whurr Publ. Ltd., 441-461. Öster, A-M. (1995b). Teaching speech skills to deaf children by computer-based speech training, Proceedings of 18th International Congress on Education of the Deaf, Tel-Aviv, Israel. Öster, A-M. (1995c). Teaching speech skills to deaf children by computer-based speech training, STL-QPSR 4/95, 67-75. Öster, A-M. (1995d). Resultat från datorbaserad röstträning med ett gravt hörselskadat förskolebarn, Nordisk Tidskrift för Dövundervisningen, 125-133. Öster, A-M. (1996). Clinical applications of computer-based speech training for children with hearing impairment. Proceedings of ICSPL 96, 157-160. Philadelphia, USA. Öster, A-M. (1997). Auditory and visual feedback in spoken L2 teaching, Reports from the Dept of Phonetics, Umeå University, PHONUM 4. viii Öster, A-M. (1998). Spoken L2 teaching with contrastive visual and auditory feedback, Proc ICSLP, Sydney. Öster, A-M. (1999a). Strategies and results from spoken L2 teaching with audio-visual feedback, STL-QPSR 1-2/99, 1-7. Öster, A-M., Vicsi, K., Roach, P., Kacic, Z., & Barczikay, P. (1999b). A multimedia multilingual teaching and training system for speech and hearing-impaired children – SPECO, Proceedings of Fonetik 99, 149-152. Gothenburg, Sweden. Öster, A-M. (2002a). The relationship between residual hearing and speech intelligibility - Is there a measure that could predict a prelingually profoundly deaf child's possibility to develop intelligible speech? STL/QPSR Vol. 43, 51-56. Öster, A-M., House, D., Protopapas, A., Hatzis, A. (2002b). Presentation of a new EU project for speech therapy: OLP (OrthoLogo-Pedia), Proceedings of Fonetik 2002, QPSR, Vol. 44, 45-48. Öster, A-M., House, D., Hatzis A., Green, P. (2003). Testing a New Method for Training Fricatives using Visual Maps in the OrthoLogo-Paedia Project (OLP), Umeå University, Department of Philosophy and Linguistics, PHONUM 9 (2003), 2-X, Available online at http://www.ling.umu.se/fonetik2003/ Vicsi, K., Roach, P., Öster, A,, Kacic, Z., Barczikay, P., Sinka, I. (1999). SPECO – A Multimedia Multilingual Teaching and Training System for Speech Handicapped Children, 6th European Conference on Speech Communication and Technology, Eurospeech ´99, Budapest, 859-862. Vicsi, K., Roach, P., Öster, A., Kacic, Z., Barczikay, P., Tantos, A., Csatári, F., Bakcsi, Zs., Sfakianaki, A. (2000). A Multimedia Multilingual Teaching and Training System For Speech Handicapped Children, International Journal of Speech Technology Vol. 3, 289-300. ix Table of Contents Abstract ........................................................................................................................... i Acknowledgement ................................................................................................................v The thesis is based on the following publications ........................................................... vii Table of Contents .................................................................................................................x 1. Introduction....................................................................................................................1 1.1. Speech communication difficulties caused by speech perception problems ...........4 1.1.1. Hearing impairment .....................................................................................4 1.1.2. Second language learners.............................................................................5 1.2. Objectives of the study ............................................................................................6 1.3. Thesis overview.......................................................................................................7 1.4. Some basic concepts ...............................................................................................9 1.4.1. The term deaf ...............................................................................................9 1.4.2. Abbreviations...............................................................................................9 2. Theoretical background and literature review..........................................................11 2.1. Introduction ..........................................................................................................11 2.2. Speech communication and profound hearing impairment ..................................12 2.2.1. Language skills of hearing-impaired children............................................13 2.2.2. Classifications of hearing impairments......................................................15 2.2.3. Speech acquisition and profound hearing impairment...............................20 2.2.4. The perception of speech through vision ...................................................26 2.2.5. Age of onset and speech quality ................................................................28 2.2.6. Speech quality of profoundly hearing-impaired children...........................29 2.2.7. Prerequisites for severely and profoundly hearing-impaired children to develop intelligible speech.........................................................................30 2.3. Need, aim, and possibilities of speech therapy .....................................................32 2.3.1. Moderate hearing-impaired children..........................................................32 2.3.2. Prelingually severe and profoundly hearing-impaired children .................33 2.3.3. Adult second language learners .................................................................35 2.3.4. Speech therapy aims ..................................................................................35 2.3.5. General steps in an individual therapy.......................................................36 3. Development of speech technology in speech therapy for severely and profoundly hearing-impaired children ..........................................................................................37 3.1. Introduction ..........................................................................................................37 3.2. Process and product-oriented therapy systems .................................................38 3.3. Tactile aids ...........................................................................................................38 x 3.4. Product-oriented visual aids.................................................................................40 3.4.1. Feature based visual indicators ..................................................................41 3.4.2. Vowel and fricative displays......................................................................42 3.4.3. Formant displays........................................................................................43 3.4.4. Fundamental Frequency displays...............................................................44 3.4.5. Computer-based speech therapy systems with visual feedback.................44 3.5. Process-oriented therapy systems with visual feedback .......................................48 3.5.1. Physiological devices.................................................................................50 3.5.2. Automatic speech tutors.............................................................................51 3.6. Spoken language training for L2 speakers ...........................................................53 4. Residual hearing for speech processing - methods of investigating the functional hearing for speech ........................................................................................................55 4.1. Introduction ..........................................................................................................55 4.2. Functional hearing ...............................................................................................56 4.3. Functional hearing and speech intelligibility .......................................................57 4.3.1. Subjects......................................................................................................57 4.3.2. Intelligibility test........................................................................................58 4.3.3. Listeners.....................................................................................................58 4.4. Results...................................................................................................................59 4.4.1. Speech intelligibility scores of 11 profoundly hearing-impaired children .59 4.4.2. The effect of listeners’ experience .............................................................60 4.4.3. Relation between amount of residual hearing and speech intelligibility....61 4.4.4. Relation between shape of audiogram and speech intelligibility ...............62 4.4.5. Relation between functional hearing and speech intelligibility .................63 4.5. Functional hearing and speech perception tests...................................................66 4.5.1. Introduction................................................................................................66 4.5.2. Decisive factors for speech tests with small children ................................67 4.5.3. Test construction........................................................................................68 4.5.4. Preliminary results .....................................................................................70 4.5.5. Conclusion .................................................................................................72 5. Effects of speech input limitations on speech production.........................................75 5.1. Deviations in the speech of moderate hearing-impaired children........................75 5.2. Deviations in the speech of profoundly hearing-impaired children .....................76 5.2.1. Factors that cause deviations in the speech of profoundly hearing-impaired children ......................................................................................................78 5.3. Deviations in the speech of L2 learners................................................................83 6. Interaction between individual deviations and speech intelligibility.......................85 6.1. Introduction ..........................................................................................................85 6.2. The effects of individual deviations on speech intelligibility measured by means of synthetic speech ....................................................................................................86 6.2.1. Introduction................................................................................................86 xi 6.2.2. 6.2.3. 6.2.4. 6.2.5. Assessment of the speech of three profoundly hearing-impaired children 87 Listening test..............................................................................................91 Results........................................................................................................92 Conclusions................................................................................................95 7. Phonetic realizations of phonological systems ...........................................................97 7.1. Introduction ..........................................................................................................97 7.2. Speech assessment methods ..................................................................................98 7.2.1. Phonetic error analysis...............................................................................98 7.2.2. Phonological analysis ................................................................................99 7.3. Description of a phonological analysis of a profoundly hearing-impaired child’s consonant production .........................................................................................100 7.3.1. Step 1: Analysis of the existing articulation skills ...................................101 7.3.2. Step 2: Assessment of the usage of the existing articulation skills through a detailed phonetic analysis ........................................................................102 7.3.3. Step 3: Assessment of idiosyncratic realisations of phonological contrasts and regular error patterns .........................................................................104 7.4. The importance of a detailed phonetic transcription..........................................105 7.5. Therapy based on existing skills .........................................................................106 7.6. Description of a phonological analysis of a Bosnian speaker’s production of Swedish vowels ...................................................................................................112 7.7. Conclusions ........................................................................................................115 8. Design of visual feedback in Swedish computer-based therapy systems...............117 8.1. Introduction ........................................................................................................117 8.2. Visual feedback in speech therapy......................................................................117 8.2.1. Nature of feedback...................................................................................118 8.2.2. Type of feedback .....................................................................................118 8.3. The IBM SpeechViewer.......................................................................................120 8.3.1. Description of the system.........................................................................120 8.3.2. Type of visual feedback in different exercises.........................................121 8.4. Box of Tricks.......................................................................................................126 8.4.1. Description of the system.........................................................................126 8.4.2. Databases for reference speech and normal-hearing children for similarity comparisons .............................................................................................128 8.4.3. Training method used in Box of Tricks ...................................................131 8.4.4. Type of visual feedback in different exercises.........................................131 8.5. OLP (Ortho-Logo-Paedia) Therapy ...................................................................137 8.5.1. Introduction..............................................................................................137 8.6. System components .............................................................................................139 8.6.1. The user interface OLPy ..........................................................................139 8.6.2. OPTACIA ................................................................................................141 8.6.3. GRIFOS ...................................................................................................145 xii 8.7. Conclusions ........................................................................................................146 9. Clinical evaluation studies of the three systems ......................................................149 9.1. Introduction ........................................................................................................149 9.2. Clinical evaluation of SpeechViewer with profoundly hearing-impaired children... .......................................................................................................................149 9.3. Clinical evaluation of SpeechViewer with L2 learners.......................................157 9.3.1. Diagnosis of individual deviations...........................................................160 9.3.2. Instructions that aimed at awareness, correct realisation, and understanding ................................................................................................................164 9.3.3. Further training to establish automaticity and transfer to untrained material. ................................................................................................................165 9.3.4. Results of a questionnaire with the thirteen L2 speakers .........................165 9.4. Evaluation studies with Box of Tricks.................................................................167 9.4.1. Results of the Swedish questionnaire.......................................................168 9.4.2. Results of the Slovenian questionnaire ....................................................169 9.4.3. Results of the Hungarian questionnaire ...................................................170 9.4.4. Clinical evaluation of the Hungarian version of Box of Tricks ...............171 9.4.5. Clinical evaluation of the Slovenian version of Box of Tricks ................173 9.4.6. Clinical evaluation of the Swedish version of Box of Tricks...................175 9.5. Clinical evaluation of the Swedish version of the OLP method with hearingimpaired children ...............................................................................................176 9.5.1. Introduction..............................................................................................176 9.5.2. Method.....................................................................................................176 9.5.3. Subjects....................................................................................................177 9.5.4. Treatment.................................................................................................178 9.5.5. Therapy objectives...................................................................................178 9.5.6. Assessments .............................................................................................181 9.5.7. Results......................................................................................................183 9.6. Conclusions ........................................................................................................190 10. General Conclusions and Recommendations ..........................................................191 10.1. Recommendations ...............................................................................................191 10.1.1. Important demands on a visual computer-based speech therapy system193 10.1.2. Efficiency of visual feedback of prosodic parameters within spoken L2 training.....................................................................................................194 10.2. Comparison of the three systems ........................................................................195 10.3. Recommended therapy design ............................................................................197 10.3.1. General design of computer-based speech therapy with visual feedback..... ................................................................................................................197 10.3.2. Structural training design .......................................................................198 10.4. Visual feedback strategies ..................................................................................200 10.4.1. Type of visual feedback for severely and profoundly hearing-impaired children ....................................................................................................200 10.4.2. Type of audio-visual feedback for L2 learners.......................................202 xiii 10.4.3. Use of automatic speech recognition and spectral comparison of phonemes with profoundly hearing-impaired children ............................203 10.5 Conclusions.........................................................................................................204 11. References...................................................................................................................209 12. Appendices..................................................................................................................227 12.1. Appendix 1: Diacritics to assess the speech of profoundly hearing-impaired children...............................................................................................................227 12.2. Appendix 2: Swedish SAMPA symbols ..............................................................230 12.3. Appendix 3: Swedish questionnaire for evaluation of Box of Tricks .................232 xiv Introduction 1. Introduction Technically advanced computer-based speech therapy systems with visual feedback are currently available for use as a complement to traditional methods to assist children with speech and hearingimpairments in perceiving and producing speech. Visual feedback has shown to be an efficient substitute for the missing auditory feedback in profoundly hearing-impaired children (Osberger et al., 1981; Watson, Reed, Kewley-Port, Maki, 1989; Arends et al., 1991; Yamada & Murata, 1991; Levitt, 1993; Javkin, 1994; Rooney et al., 1994; Öster 1996, 1999a; Öster et al. 1999b, 2002, 2003). It has also been shown to be a valuable supplement to audio and verbal feedback in speech training for moderately and severely hearingimpaired, normally hearing children with speech deviations as well as for second language (L2) learners. Generally, in spoken language training with L2 learners, visual feedback is seldom used. Training is frequently carried out through traditional record-and-play-back models (Hincks, 2005) that provide audio feedback. According to Neri et al. (2002, pp. 443) there is “…a general tendency to neglect pronunciation in favour of grammar and vocabulary in research on second language acquisition.” Speech-interactive and self-assessed language learning systems focus on the implementation of automatic speech recognition (ASR) providing verbal feedback (Eskenazi, 1999). ASR automatically detects segmental errors fairly well but has problems identifying prosodic errors such as deviantly produced intonation, stress patterns and fluency. However, it has been shown that the use of visual feedback has helped L2 learners in perceiving and producing especially new stress patterns and intonation of a target language (Flege, 1989; Öster 1998), a fact that supports a wider usage of visual feedback of prosodic parameters in spoken language training within second language learning. In a computer-based speech therapy aid, visual feedback of acoustical parameters is used to give the client possibilities to evaluate his/her own speech compared to a target of accepted production. By using different amusing drawings to illustrate loudness, pitch contour, spectral distribution, etc, the child’s attention can be drawn to important parameters in the speech which 1 Computer-Based Speech Therapy Using Visual Feedback can help the child to recognise whether his/her pronunciation is improving or not. Such an advanced technical aid can assist the teacher and help the client follow three very important steps in speech learning: instruction, training, and generalization. Especially in the instruction phase of speech therapy, immediate and meaningful visual feedback can help a child to be aware of the manner and place of articulation as well as of distinctive contrasts between similar speech sounds. Many distinctive features such as, for example, nasality/non nasality, are invisible through lip-reading and consequently difficult to produce correctly. By this technique it is also easier for the therapist to instruct and explain what is wrong and what is correct in the child’s production. Motor learning theory in speech development also indicates that accurate feedback and repeated practice are essential to establish automaticity and linguistic use. This is the most important element in a speech therapy program but the most difficult for a therapist to carry out. The target production must be repeated and practiced in a variety of contexts. Using computer-based speech therapy with visual feedback in this situation is particularly helpful and motivating in helping the child to carry out significant amounts of additional training. However, despite highly developed technical systems that offer flexible training the result of computer-based speech therapy (CBST), especially regarding profoundly hearing-impaired children, is not always as successful and promising as could be expected (Bench, 1992). Reasons could be that the therapy is not always based on a clinical-phonetic point of view, is not adjusted to the individual child’s needs and that it doesn’t pay attention to how speech sounds are realised in spoken language. The importance of such an individualised assessment was stressed by, for example, StoelGammon and Dunn (1985). He stated that it is necessary to identify the unique characteristics of each child's system in order to design the most appropriate treatment plan for each child. Tanner Dyson (1988) as well as Saben and Ingham (1991) claimed that it is important to concentrate on the usage of a child's existing articulation skills, that is, the speech sounds that the child knows how to produce correctly in isolation without knowledge of their meaning and linguistic use, to identify the deviations made by the child. This is particularly important when it comes to severely and 2 Introduction profoundly hearing-impaired children’s speech as it contains many idiosyncratic realisations of phonological contrasts. Traditionally, the speech of profoundly hearing-impaired children is assessed using a phonetic error analysis only. The articulation skill is then compared sound by sound with that of normally hearing adult speakers (Tanner Dyson, 1988) without regard to phonetic contexts or to the production’s contrastive function in a specific language. Such an analysis uses a coarse phonetic transcription missing important articulatory details and provides only information on what the child is not capable of articulating. Hence, only speech sounds that the child never articulates correctly will be treated. If instead a phonologic assessment of a very detailed phonetic analysis is used, the child's individual system can be described by identifying all segments used by the child with the help of IPA symbols and special diacritics. In this way answers could be provided to questions such as: How successfully can the child produce speech contrasts? Might a deviant production in fact be a realisation of a phonological contrast? In what ways do the phonetic elements used for contrastive function differ from those used in normal speech? Hence, basing an individual speech therapy program on a phonologically based assessment that investigates a child's unique phonological system (phonetic and phonemic inventories) as well as the linguistic use of the existing articulation skills might contribute improving, the results from computer-based speech therapy (Öster 1991, 1992a, 1992b). With the help of well understandable visual feedback strategies the child’s attention can be drawn to deviated realized phonetic features and correct pronunciation can be trained with specially designed encouraging visual feedback in order to improve the intelligibility. Perhaps then successful training might come about with the help of an advanced technical computer-based speech therapy system with specially designed visual feedback. Another problem is that the existing computer-based speech therapy systems developed for the Swedish language use different kinds of visualisations of important speech parameters, correct from the acoustic-phonetic point of view but not always useful and understandable for all kinds of users like children, adults, clients with a hearing loss, normally-hearing clients with speech defects, 3 Computer-Based Speech Therapy Using Visual Feedback and second language learners. Furthermore fundamental principles, reference speech, target spectra, methods for contrastive training, and objective evaluation techniques differ between the systems and are not appropriate to be used by all client groups. Recommendations and evaluations of what type of visual feedback strategy and therapy structure that should be appropriate to use with different user groups are seldom made. Hence a further explanation to the sometimes limited and varied progress reported from computer-based speech therapy/training could be the neglect of information about what kinds of visualisations and therapy structures would be most viable to use with individual client groups. 1.1. Speech communication difficulties caused by speech perception problems 1.1.1. Hearing impairment Auditory feedback is necessary for the speech development of normally-hearing children. The child can hear his own production, compare his own production with that of others and correct his production little by little. Children with a moderate hearing loss develop spontaneous speech, but their pronunciation often suffers from distortions and lack of articulatory precision. The degree of intelligibility of their speech production is related to their hearing impairment. More impaired articulation accompanies higher hearing loss (Boothroyd, 1984; Levitt & Geffner, 1987; Ling, 1976; Smith, 1975; Markides, 1985). Considering hearing loss as a single filter (not compensating for other disordered perceptual processes), deviant articulation may be predicted from the pure tone audiogram. However, other facts have to be taken into consideration when predicting the intelligibility of the speech of moderately hearingimpaired children. The children may have been hearing-impaired at different ages, been exposed to different educational methods, have had different social and psychological backgrounds and have had different use of hearing aids. However, for profoundly hearing-impaired children, born with hearing losses above 90 dBHL, the degree of correlation is reduced and their intelligibility can not be predicted from the pure 4 Introduction tone audiogram (Monsen, 1978). These children have no acoustic speech target to imitate and compare their own production with when learning speech. Therefore, they develop no spontaneous speech but must learn speech through vision, tactile sensation and, if possible, residual hearing. Other senses must replace the auditory feedback that hearing children use when they learn to speak. By relying on vision and the limited visibility of phonetic features in learning oral speech, they can establish an orosensory-motor control of their speech movements and their acoustic output. The limited speech perception potentials of these children cause them to make unavoidable phonetic deviations, which affect their ability to signal meaning differences in spoken language. The speech that these children develop varies from poor to rather good, showing typical and systematical deviations as well as unusual and non-standard features owing to the fact that their phonological systems are built up by visually detectable distinctions between different phonemes. Hence, a phonological system acquired by “hearing by eye” will differ from the norm both on the phonetic and phonological level (Dodd, 1988). The aim of speech therapy for severely and profoundly hearing-impaired children is that the children should be helped to acquire new speech patterns and develop intelligible speech for the purpose of making statements and more seldom for the purpose of communication. They should learn a “survival speech” that makes it possible for them to give and to understand simple messages in shops, in the street, in a hearing society, etc. For moderately hearingimpaired children, who rely on auditory information when learning speech, therapy is more directed towards speech correction than speech acquisition. 1.1.2. Second language learners Adults learning Swedish as a second language are another group that have difficulties in perceiving the phonetics and prosody of a second language through audition, not because of a hearing loss but because they are not able to hear new sound contrasts because of interference with their native language. Ongoing research has pointed out that this group fails to perceive difficult and new contrasts in the target language by ear alone. (Jamieson, 1995; McAllister, 1995; Flege, 1998). Age of first exposure, motivation, first 5 Computer-Based Speech Therapy Using Visual Feedback language (L1), amount of use of L1 and L2. etc. are some of the factors that affect L2 speakers´ pronunciation (Piske, MacKay & Flege, 2001; Neri, Cucchiarini, Strik & Boves, 2002). Typical pronunciation difficulties for a given target language will differ for speakers of different native languages (Dalby & Kewley-Port, 1999). Sounds that are similar but not identical cause the most serious problems both in perceiving and producing them. Flege’s Speech Learning Model (SLM) is stated as follows…..” L2 features not used to signal phonological contrast in L1 will be difficult to perceive for the L2 learner and this difficulty will be reflected in the learner’s production of the contrast based on this feature” (McAllister et al., 2002, pp. 230). L2 learners make pronunciation errors of two types (Eskenazi, 1999). They articulate some target phonemes deviantly because they differ in number and quality from those of their mother tongue (L1) and they use prosodic parameters such as intonation, duration, and stress placement inappropriately. Speech therapy for L2 speakers aims at eliminating misarticulated speech patterns to achieve a native-like production and to improve the consistency of production. 1.2. Objectives of the study This study investigates some conceivable reasons to the sometimes limited and varied progress reported from computer-based speech therapy/training with visual feedback. Hypotheses are presented with reference to diagnostic methods, visual feedback strategies and therapy design of two client groups, namely children with severe and profound hearing losses and second language learners. The first aim is to study the importance of an individual phonological assessment as a diagnosis to base the training on and to make the best use of visual feedback. Such an assessment investigates the linguistic usage of the client and assesses in what way the used phonetic elements differ from the phonetic elements used for contrasts in normal speech. The hypothesis is that without a phonological assessment based on a very detailed phonetic analysis it could be that therapy only results in a series of meaningless “articulation gymnastics” sessions. The second aim is to investigate and give some recommendations of type and design of visual feedback for these 6 Introduction client groups that have different speech processing capabilities and rely on different senses for speech input/output. Furthermore, the intention is to give recommendations of therapy design for these client groups derived from their different aims and needs of therapy. The third aim is to test the efficiency of visual feedback of prosodic parameters within spoken L2 training to investigate whether it is worthwhile to be used as a general tool in computerassisted pronunciation training (CAPT) within second language learning. 1.3. Thesis overview This monograph comprises 10 chapters. Chapter 1 is the introduction, which addresses the goals of the study and reports on the speech communication difficulties of profoundly hearingimpaired children and second language learners caused by speech perception problems. Chapter 2 contains a review of the theoretical background and literature review of work done by others on different aspects of speech communication and profound hearing impairment, such as speech acquisition, speech quality, and different classifications of hearing impairment, need, aim and possibilities of speech therapy for hearing-impaired children as well as for adult second language learners. Chapter 3 presents the development of speech technology in speech therapy from early visual aids to more technically advanced computer-based systems using visual feedback. Chapter 4 deals with the relation between residual hearing capabilities for speech (the functional hearing) and intelligibility of the speech of profoundly hearing-impaired children. Results of a speech intelligibility test of 11 children are reported that investigate the relationship between the children’s speech intelligibility and different factors such as mean hearing loss, shape of audiograms, and functional hearing. An analytical computerized speech perception test for small children, with the effort to develop a screening method that can predict a child’s ability to develop intelligible speech, is also described. The result of the test also provides an early diagnosis of difficulties to base the therapy on. 7 Computer-Based Speech Therapy Using Visual Feedback The following Chapter 5 reports on general deviations in the speech of profoundly hearing-impaired children and adult second language learners and discusses the causes why they occur. Chapter 6 reports on a study that investigated how individual deviations interact on overall speech intelligibility. This was measured through generated synthetic speech made up from three profoundly hearing-impaired children’s individual vowel, consonant and prosodic deviations that were artificially corrected towards normal speech through digital manipulation. The effects of the corrections of the different deviations on intelligibility were measured through listening tests that provided important results to be used in speech therapy. Chapter 7 emphasizes the importance of phonologically assessing in what way profoundly hearing-impaired children’s phonetic realizations of Swedish phonology differ from the normal model and in what way they, as well as adult second language learners, realize their existing articulation skills in linguistic contexts. The results of such an analysis are necessary for the planning and efficiency of an individually performed computer-based speech therapy. The description of such a phonological assessment applied to the speech production of both a profoundly hearing-impaired child and a Bosnian speaker is presented. In the following Chapter 8 the design of the different types of visual feedback used in three Swedish computer-based speech therapy systems are exemplified, compared, and examined in relation to their potentials to be effective, attractive, easy to comprehend, user-friendly, motivational, instructive, and logical to the user. Different clinical evaluation studies made with the three systems are reported in Chapter 9. Finally, in Chapter 10, general conclusions of the efficiency of computer-based speech therapy using visual feedback for profoundly hearing-impaired children and audiovisual feedback for training of perception and production of the Swedish prosody within spoken L2 training are discussed and recommendations regarding therapy design and visual feedback strategies for the two groups and for the different stages during therapy are specified. 8 Introduction 1.4. Some basic concepts 1.4.1. The term deaf The term deaf in Sweden covers a number of various groups of persons, differing in etiology, degree of hearing loss and articulation skills. To be identified as either deaf or profoundly hearing-impaired is nowadays more social than biological. Deaf persons are dependent on sign language and use it in their everyday communication. Those who prefer to communicate through sign language belong to the deaf community irrespective of how great the hearing loss is (Larsson, 1997) in contrast to those who have a preference to speak, lip-read other people and use technical aids. The latter group is considered to be hearing-impaired. However, nowadays many hearing-impaired children learn sign language and are thanks to that bilingual. In many studies dealing with differences between the speech of students who are deaf and of those who are hard-of-hearing, especially outside Sweden “hard-of-hearing” typically describes children with mild and moderate losses and “deaf” usually refers to children with severe and profound hearing loss (Yoshinaga-Itano and Sedey, 2000). Here the term profoundly hearing-impaired is used instead of deaf all through the thesis and refers to the Swedish children who join the speech clinic because they are motivated and have possibilities to develop an intelligible "survival speech" for the purpose of making statements and, more seldom, for the purpose of communication. The aim of speech therapy in Sweden is to make it possible for them to give and to understand simple messages in shops, in the street, etc. However, the term deaf is used in the literature review and in Chapter 9 by the Hungarian and Slovenian partners in their evaluation studies of Box of Tricks. 1.4.2. Abbreviations ASR Automatic speech recognition CALL Computer-assisted language learning, for teaching and training a second language. 9 Computer-Based Speech Therapy Using Visual Feedback CAPT Computer-assisted pronunciation training, for teaching and training the pronunciation of a second language. CBST Computer-based speech therapy for teaching and training speech production of children with speechand hearing impairments. dBHL Hearing loss is measured in decibels hearing level L1 Mother tongue L2 Second language MRI Magnetic resonance imaging PTA Pure-tone averages SAMPA Speech Assessment Methods Phonetic Alphabet is a machine-readable phonetic alphabet. It was originally developed under the ESPRIT project 1541, SAM (Speech Assessment Methods) in 1987-89, see Appendix 2 (sj) Example of the orthographic representation of a speech sound /S / Example of the IPA transcription of a phoneme [Ó] Example of the IPA transcription of the articulatory aspect of a speech sound. 10 Theoretical background and literature review 2. Theoretical background and literature review 2.1. Introduction The speech of severely and profoundly hearing-impaired children exhibits wide variations in degree of intelligibility and should be looked upon as a special idiolect that is dependent on speech reception capabilities and on other factors that these children rely upon when learning to speak. Profoundly hearing-impaired speakers often try to realize the visual representation of a phonetic contrast, which is normally signaled auditorily (Monsen, 1976). This means that a deviant production, for instance a silent articulation for /f/ with absence of air-stream but articulation present or a stop with a non-audible release may be an attempt to realize a speech sound. Comparing the articulatory skill of profoundly hearingimpaired children with that of normal-hearing children is inappropriate as the quality of the articulation is relatively unimportant as long as the speech is intelligible. The intelligibility of the speech of profoundly hearing-impaired will depend on to what extent the phonological system and phonetic realization of the system resembles the norm of the language users in general. This can only be assessed through a phonological analysis that concentrates upon regularities in the pronunciation used in spoken language and the description of these rules. A phonological description looks for underlying rules in a child's speech, and specifies any idiosyncratic realization of phonological contrasts through an analysis of a detailed phonetic transcription. This chapter reviews earlier studies of profoundly hearingimpaired children’s speech production. The aim is to give a background to the statement that the speech of severely and profoundly hearing-impaired children is unique and to provide necessary basic issues of why an objective phonological analysis of their speech is required. The intention is also to give a background to the fact that a special visual feedback strategy and therapy design must be used during speech therapy with severely and profoundly 11 Computer-Based Speech Therapy Using Visual Feedback hearing-impaired children in contrast to what should be the most useful visual strategy in spoken language training with L2 learners. 2.2. Speech communication and profound hearing impairment Language can be written, spoken, signed and even symbolized depending on people’s needs and capacities to communicate. People with severe motor disabilities use Augmentative and Alternative Communication through different kinds of symbols for social interaction and education, like the Morse code, Bliss symbols and other picture systems. Communicating through written language involves both reading and writing competences, but communicating successfully with speech requires good hearing to be able to perceive, understand and produce speech. Deaf persons who are deficient in hearing communicate through Sign language, a visual, gestural, and spatial manner that they use in their everyday communication. Children who are born with a severe hearing loss have of course a decreased ability to learn speech through an auditory-verbal approach compared to normally hearing children. The limited auditory feedback affects not only profoundly hearing-impaired children’s speech perception potentials but reduces also the opportunity of self-correction and a comparison of their own speech quality with those closest to them. However, a hearing loss affects also certain other important factors of both the listener and the speaker that are necessary for a satisfying speech communication to be possible according to Finnerty (1996) and Loughlin (2005). These factors are: • Auditory feedback • Physiological control • Knowledge of the speech code • Identical references • Interpretation • Rich experience (ability of guessing) • Awareness (interpret sounds as meaningful) • Non-verbal skills (such as pointing, showing, gazing, head nod/shake, facial expression, eye contact, crying, imitation and vocalising) 12 Theoretical background and literature review Profoundly hearing-impaired children often suffer from disorders of physiological control like deviant respiratory patterns as well as breathiness, voice breaks, unstable pitch, nasality and vocal fry. This is due to a restricted use of the vocal apparatus and the fact that they have to learn oral language by a laborious visual imitation of speaking (Grewel, 1963). Non-visible speech elements like for instance, nasality, voicing, and fundamental frequency can be related to typical deviations in the speech of profoundly hearing-impaired children (Martony, 1971; Monsen, 1976; Öster 1992b). The knowledge of the speech code, i.e. the knowledge of how linguistically defined units are realised in the speech act (Fant, 2001), and the possibility of identifying speech segments is of significant importance for speech communication. Severely and profoundly hearing-impaired children have incomplete knowledge of the sound system of the target language and have difficulties in interpreting sounds as meaningful, understanding the contents of a message, and using the target language in different situations. Furthermore, the speaker and the listener must have identical references to be able to understand each other. All these important aspects of speech communication must be learnt with the help of a therapist mainly through the visual and kinaesthetic channels. It is a matter of fact that a rich linguistic environment and regular exposure to speech lead to a rich experience of the language used. This implies better possibilities of guessing when the listening condition is inopportune. According to Furth (1964) profoundly hearing-impaired children possess normal cognitive processes but suffer from general restrictions in experience, interactions, and opportunities to learn spoken language. 2.2.1. Language skills of hearing-impaired children A determining condition for a successful speech communication to take place is to know and to master the structure of the phonology, the morphology and the syntax. Limited research has been done on the language skills of hearing-impaired children or on the underlying processes (Bamford and Saunders, 1991). Normal language development has been used as a model to tell what is wrong in the language of hearing-impaired children. Longitudinal development 13 Computer-Based Speech Therapy Using Visual Feedback studies have been rare. Unfortunately, most published international studies contain data from subjects representing a whole range of hearing losses from mild to profound in degree. Background data that may affect language acquisition has not always been taken into consideration. The children may have been hearing-impaired at different ages, been exposed to different educational methods, have had different social and psychological backgrounds and have had different use of hearing aids. However, the language development of children with mild hearing losses often does not obviously differ from that of normal hearing children in its main features, but it may be delayed. A broad summary based on different studies of how the language of prelingually hearing-impaired children might differ from that of normally hearing children is given by Bamford and Saunders (1991). It was reported that hearing-impaired children use: • • • • • • • reduced numbers of words (tokens) reduced numbers of different words (types) reduced numbers of grammatical correct sentences shorter sentences simpler and generally stereotyped sentence constructions more content words, particularly nouns and verbs reduced numbers of conjunctions and prepositions frequent errors of word orders Borg et al. (2002) established a reference material for a language test for hearing-impaired children, “LATHIC”. The test is made for children in the age range 4 - 6 years with pure tone averages of 80 dBHL or less and with spoken Swedish as the first language. The LAHTIC test battery consists of eight subtests: mental development, speech reception in noise, segmental phoneme discrimination, phonological short-term memory, phrasal prosody, phonology, speech motor functions and phoneme mobilisation. In addition the child’s certainty and co-operation are judged in each test. The most important result of the analysed material was that children with hearing impairment greater than 60 dB (also unilateral) had a delayed language development compared to normal hearing children of the same age, the delay increased with larger losses, the delay decreased for older children and no differences were found between boys and girls. The total number of tested children was 14 Theoretical background and literature review almost 400, including 87 normal-hearing children who participated as a control group. 199 of the hearing-impaired children completed all tests and constituted the analysed material. 2.2.2. Classifications of hearing impairments Pure tone threshold audiometry, using air and bone conduction, is a measurement of the hearing loss a child suffers from (Lidén, 1985). It measures the auditory loss at several frequencies by pure tones, generally at 125 Hz, 250 Hz, 500 Hz, 1000 Hz, 2000 Hz, 4000 Hz and 6000 Hz. Each sound frequency is sent out in the ear with the help of a headset and is gradually increased in intensity until the child can hear it. The result of pure tone audiometry indicates the dividing line between hearing and not hearing and gives a confirmation of a child’s hearing sensitivity, see Figure 2.1. Figure 2.1 Pure tone audiogram showing a person’s threshold to steadystate pure tones. Pure tone average, PTA is a method used to calculate the results and presents a child’s hearing sensitivity and hearing handicap in a practical way. It is given by averaging the air conduction thresholds 15 Computer-Based Speech Therapy Using Visual Feedback for the frequencies 500, 1000, and 2000 Hz which are considered to be the major and important frequencies for speech. Various classifications and degrees of hearing impairment have been presented over the years. Traditionally, a classification of a hearing loss was based on how the impairment affects the ability to understand speech (Lidén, 1985). Stach (1998) illustrated the communicative effect of a hearing loss through a classification, shown in Table 2.1, based on the mean of the frequencies 500, 1000, and 2000 Hz. Table 2.1. Classification of degree of hearing loss and impact on communication, (Stach, B. A., 1998.) dBHL 10 to 10 10 to 25 Degree Normal Minimal 25 to 40 Mild 40 to 55 Moderate 55 to 70 Moderatelysevere Loud conversational speech is audible Severe Conversational speech is not audible Profound Loud sounds may be audible 70 to 90 >90 Communicative Effect None Difficulty hearing quiet speech in presence of noise Difficulty hearing quiet or distant speech, even in quiet Conversational speech is audible if at close distance Important information about a child’s possibilities to perceive the information-bearing elements in the acoustic speech signal can be provided when the child’s pure tone audiogram is studied in relation to the speech banana, shown in Figure 2.2. The speech banana is taken from the measurements of Fant (1959). It is a section on the audiogram (with the shape of a banana) that covers the frequency area in which different speech sounds are produced. The limit between hearing and vibration in the low frequency range, the intensity of one’s own speech and the threshold of discomfort is also indicated in Figure 2.2. 16 Theoretical background and literature review Figure 2.2. The speech banana showing the most important area for the perception of the speech sounds, Fant, 1959. Martony and Risberg (1970) developed a method for classifying audiograms to be used when estimating hearing aid selection, methods of teaching, and results of speech training with different aids, see Figure 2.3. The method is based on degree of the hearing loss and the shape of the audiogram. By this method the audiograms are divided in a low frequency area from 125-1000 Hz and a high frequency area from 2000-8000 Hz. These two areas are then divided in 0-D for low frequencies and 0-6 for high frequencies. Consequently, by a combination of a letter and a digit, coarse information about the quality of the residual hearing and the shape of the pure tone audiogram is given. 17 Computer-Based Speech Therapy Using Visual Feedback Figure 2.3. Audiogram classification according to Risberg & Martony (1970) that gives information about quality of the residual hearing and shape of the audiogram. A classification may also be based on how the impairment affects the ability to develop speech (Risberg, 1979). It is based on the speech banana and the average thresholds (PTA) of the frequencies 500, 1000, and 2000 Hz (Table 2.2). However, the sort of speech a hearing-impaired child develops depends not only on the amount of hearing, as measured by puretone threshold audiometry, but also on the quality of the hearing sensation and the use the child through training is able to make of his/her residual hearing. Several studies have shown that there is a close relationship between degree of hearing impairment and the intelligibility of hearing-impaired children’s speech. More impaired speech accompanies higher hearing loss (Boothroyd, 1984; Levitt & Geffner, 1987; Ling, 1976; Smith, 1975; Markides, 1985). 18 Theoretical background and literature review Table 2.2. Classification of how a hearing loss affects the ability of developing speech, (Risberg, 1979). Threshold Ability to develop speech < 39 dBHL 40-54 dBHL Some delay and some deviations. Pronounced effect. Need for special education and hearing aids. Hearing aids and speech correction necessary. Delayed speech development. Need for speech learning and speech correction. Benefit from sign language. Sign language as first language. Speech development related to functional hearing. Reliance on visual cues. 55-69 dBHL 70-89 dBHL > 90 dBHL On the average, speech intelligibility decreases with increasing hearing loss up to a loss of about 90 dB. Above that the degree of correlation is reduced. However, Monsen (1978) pointed out that a good audiogram may correlate quite consistently with good speech; but, on the other hand, children with more profound hearing losses may commonly span the whole range from very intelligible to quite unintelligible speech. Hence, the intelligibility of the speech of children with pure-tone averages greater than 90 dB cannot be predicted from the degree of hearing loss, as measured by pure-tone audiometry. This was shown in a study by Adolvsson and Forsén (1968) where the intelligibility of hearing-impaired children's speech from the 8th grade of the Manilla- and Alviksschool in Stockholm was investigated, see Figure 2.4. The intelligibility of the children's speech was rated as intelligible (3), intelligible without difficulties (2), intelligible with difficulties (1), and unintelligible (0). The figure shows that on the average, speech intelligibility decreases with increasing hearing loss (mean level at 500, 1000, and 2000 Hz) until a loss of about 90 dB. For hearing losses greater than 90 dBHL the degree of correlation is reduced. Vastly varying intelligibility scores from poor to rather good speech were shown for children with pure tone averages above 90 dBHL. 19 Computer-Based Speech Therapy Using Visual Feedback pure tone average, dBHL Figure 2.4. Rated intelligibility compared to hearing level of hearingimpaired children (Adolvsson and Forsén, 1968). Speech audiometry is an important qualitative complement to puretone audiometry. Recorded test words are transmitted to the patient with the help of a headset or a speaker. Speech audiometry can be global or analytical. The aim of a global examination is to determine the ability to understand speech in everyday use while an analytical examination determines the ability to perceive single segments and suprasegmentals (Arlinger & Hagerman, 1997). 2.2.3. Speech acquisition and profound hearing impairment The hearing child perceives speech and language spontaneously through a combination of hearing and visual information, see Figure 2.5. Lip-reading is no longer seen only as a compensatory information channel to substitute for hearing. Mogford (1983), for example, has shown that visual speech information is implicated in normal processes of speech acquisition. The talking face is important from an early age in gaining and holding the attention of the listener. Wundt (1911) called the observation of the talking face "the impulse to speech". Young children observe lip-movements to find out how to articulate different speech sounds and then imitate these movements. Visual articulatory movements give concrete cues about 20 Theoretical background and literature review how to produce speech sounds, while hearing offers possibilities to control, compare and modify one’s own production according to the surrounding pronunciation and linguistic usage. The hearing and sighted child will learn those sounds that have clearly visible articulation more quickly and with less errors than those with nonvisible articulation. In time the hearing child is able to interpret what is being said through hearing alone, without having to watch the speaker. According to Mahshie (1996, pp. 153)…“they develop an internal mode of articulatory/phonatory outcomes or aims”. In general the productions of sounds at the phonetic level, developed from comparison of others’ and one’s own articulatory patterns, precede their meaningful use at the phonological level (Ling, 1976). A phonetic description shows how a child articulates speech sounds and what speech sounds are present in his existing articulation skills without knowledge of their meaning and linguistic use. When the hearing child is becoming aware of the relationship between his own speech and those of others he gradually uses his existing articulation skills in a meaningful way at the phonological level. The auditory feedback allows a hearing child to differentiate between the sounds he makes, derive meaning from the sounds that others make, be aware of the extent to which his own speech corresponds with that of others and, what is most important, to continuously self-correct and improve his own articulation. This is illustrated in Figure 2.5 by the means of arrows that show how correct articulation is established. Before sounds are used meaningfully (phonologically) children babble (reduplicated sequences of syllables) most of the sounds (phonetically) to reach an automatic level. Spontaneous vocalisation based on physiological conditions steadily increases and babbling becomes stabilised (Mavilya, 1972). Nearly all infants with normal hearing begin the canonical stage of babbling before 10 months of age (Roug et al, 1989). Late onset of canonical babbling may be a predictor of speech and language disorders. The phonetic repertoire will gradually and spontaneously turn into a meaningful use. Jakobson (1968) claimed that children learn contrasts, not individual sounds, in a certain order from maximal to minimal contrast. 21 Computer-Based Speech Therapy Using Visual Feedback Figure 2.5. Normal-hearing children’s acquisition of language and speech. The hearing child produces and differentiates meaningful sounds by comparing his own and others' articulatory patterns through proprioception; that is audition and orosensory-motor control (Ling, 1976). At the age of 18 months children with normal hearing master the production of about 50 words. Currently, there are four schematic stages of normal vocal development that researchers agree on according to Oller (2000). Those are as follows: • • • • The Phonation stage (0-2 months) The Primitive Articulation stage (1-4 months) The Expansion stage (3-8 months) The Canonical stage (5-10 months) David Crystal (referred in Finnerty, 1996) describes the child’s vocal development in each stage to be basic biological noise, cooing, vocal play and babbling. He adds a fifth stage between 9 – 18 months that involves melodic utterances. 22 Theoretical background and literature review The blind child, however, has no access to visual information. It would then seem likely that this would affect his/her phonological development. In fact, some studies (Wills, 1981; McConachie, 1990; Preisler, 1991; Thorén, 2002) have reported that speech and language acquisition is delayed by blind children both regarding the production of the first words at about 18 months of age and of twoword-sentences at about 2 years of age. Mills (1983) has reported that facial movements in blind children are described as "muted", and the articulation is described as less distinct. There has been little research done concerning very young children and no exact comparison has been made with sighted children at the same stage of development. Most research has been done on blind children at the age of 6 to 18 years. According to Mills (1983, pp. 156)..."blind children will follow a different and slightly slower path in earlier phonology compared to sighted children and this is attributed to the absence of lip-read information". Three visually handicapped children were studied at the age of 1-1.5 years. The children showed slower acquisition of those sounds that had a visible articulation compared to sighted children. They also made different types of errors and had a slower development. However, in the long term the lack of lip-read information was not crucial for the development of phonology. As they got older the acoustic selfcorrecting feedback to compare spoken sounds with identical ones produced by adults became more important. In the long term the studied children showed no sign of developing a disordered phonological system. Göllesz (1972) investigated Hungarian vowels articulated by blind children at the age of 13 years, through electromyography and spectrography. He found more pitch modulation and less lip movements compared with normal sighted children. However, the spectrograms showed no deviations in the acoustic properties of the productions. Imitation and self-correction are important factors in speech learning as has been stated above. Children who are born with an auditory deficit have a limited acoustic speech target to imitate and compare their own production with. Children with moderate hearing impairments acquire speech through residual hearing and vision. The speech production ability of a hearing-impaired child is to some 23 Computer-Based Speech Therapy Using Visual Feedback extent proportional to the severity of the impairment. This relationship is valid up to the point of profound hearing losses (Monsen, 1983), see section 2.2.2. However, the conventional descriptors as mild, moderate, and severe hearing thresholds cover a wide range of hearing losses and different shapes of audiograms. Therefore there is a wide variation in speech acquisition among hearing-impaired children. The use of hearing-aids from an early age and special education is of special importance to support speech acquisition. “While amplification may permit improved access to some aspects of the produced signal, amplification alone may be inadequate in many cases for successful acquisition of spoken language” (Mashie, 1996, pp. 153). Profoundly hearing-impaired children have to mostly rely on the limited visibility of phonetic features in learning oral speech and on orosensory-motor control in maintaining speech movements. Other senses must replace the auditory feedback that hearing children use to adjust and modify their articulation when they learn to speak. These children seldom develop speech spontaneously, but their speech is traditionally developed through a structured training, using the visibility of speech articulation, reading, tactile sensations and, if possible, residual hearing. The limited speech perception potential of profoundly hearing-impaired children and their reliance on lip-read information causes them to articulate speech segments and prosodic features deviantly. Despite these limitations they may develop systematic phonological processes that however differ from those of normally hearing children like for instance, voicing: [dçka] is realised [dç:ga], devoicing: [rO:d] is realised [{çt:], fronting: [sE:l] is realised [fQ{], backing: [∆ø:ra] is realised [hy:{a] and stopping: [hA:v] is realised [hAp|]. Several studies have shown that profoundly hearing-impaired children possess some kind of abstract and stable phonological systems (West and Weber, 1973; Oller and Kelly, 1974; Dodd, 1976, 1988; Oller and Eilers, 1981, Abberton and Fourcin, 1985; Öster 1989a, 1991, 1992a, 1992b) but these may differ from those of normally hearing speakers. Dodd (1988) has discussed the fact that the limited information available in visual aspects of speech may be used to develop a phonological system. She suggests that strong evidence is provided that lip-read and heard speech is processed in a code 24 Theoretical background and literature review insensitive to input modality. A phonological system can be derived from hearing by ear or from hearing by eye, but the resulting systems will differ in some respects. The fact that profoundly hearing-impaired children tend to produce a speech sound in the same deviant manner, in similar contexts was confirmed in a study by Öster (1989a). A 15-year-old profoundly hearing-impaired boy, who attended the 8th grade at the School for the Deaf in Stockholm (the Manilla School), was recorded twice at an interval of three months. At both times the child read the same speech material that consisted of monosyllabic and disyllabic words as well as sentences which contained these words. The words were chosen to be common and familiar to the child. Some of the consonants did not occur in all possible Swedish positions, as can be seen in Table 2.3. The video recordings were transcribed using the symbols of the International Phonetic Alphabet. Many peculiarities and fusions of errors occur in prelingually profoundly hearing-impaired children's speech, which made an expansion of diacritical marks necessary. Some of those which Bush et al. (1973); Grunwell (1987); Roug, Landberg, and Lundberg (1989) have developed for the transcription of babbling and phonetic development in early infancy were used (see Appendix I). The results of a narrow phonetic analysis of the two recordings are summarized in Table 2.3 according to the position in the word and to standard phonological representation. The results show that there was a high stability in the child's consonant production and the phonological representation between the two readings. Each representation is based on at least four readings and represents words in isolation as well as words in sentences. The only exception that showed a phonologic instability between the two readings was the use of the fricatives /s/ and /S/ in initial and medial position. 25 Computer-Based Speech Therapy Using Visual Feedback Table 2.3. Narrow phonetic analysis of the child’s two readings (reading I and reading II) according to standard phonological representation and to position in the word. 2.2.4. The perception of speech through vision The perception of speech through vision is difficult because many articulatory features of speech are not easily accessible from visual observation. Acoustically each speech sound is unique, but visually many sounds are hard or impossible to discriminate. Some speech sounds have almost identical visual articulatory movements while 26 Theoretical background and literature review others have non-visible articulation. Moreover, coarticulation influences the visibility of many speech movements. Any set of speech segments that is visually contrastive from another is called a viseme (Woodward & Barber, 1960; Fisher, 1968). Confusions in both articulation and perception should therefore occur within visemes but not between them according to Martony et al., (1970). The lipreader can identify major differences in the place of articulation but has difficulties with the manner of articulation according to Risberg (1982). Woodward & Barber claimed that there are four visually contrastive units: bilabials, rounded labials, labiodentals and non-labials. However, according to Owens & Blazek (1985) the set of visemes varies from study to study due to differences in languages, talkers, stimuli, subjects' response tasks, and effects of vowel contexts. They examined the effect of vowel context on viseme identification and found that the vowel /u/ limited the number of contrastive visual units compared to /a/ and /i/. Martony, Risberg, Agelfors, & Boberg (1970) found three visible groups for Swedish consonants according to place of articulation: bilabials /m, b, p/, labiodentals /v, f/, and 'others' /t, d, s, n, C, Ó, l, j, r, k, g, N, h/. As much as 82% of the Swedish consonants belong to the group 'others'. Concerning Swedish vowels, two groups were identified (rounded/unrounded) due to the visibility of lip rounding and jaw opening. The unrounded articulatory vowels /a/and /i/ and the rounded /u/ were also visually contrastive. Martony (1975) emphasized this fact in a study where he showed that the distinction rounded/unrounded was consistently correctly produced by Swedish deaf children. Erber (1974) stated that about 40 English phonemes are cut down to roughly 16 visemes in conversational speech because manners of articulation, such as voicing and nasality, are not visible. Markides (1989) stated that lip-reading gives correct identification of about 30-40% of initial consonants and only 20-30% of final consonants. According to Ewing (1941), a discrimination of 70% for consonants is required to understand speech efficiently. Consequently, prelingually and severely and profoundly hearingimpaired children, who mostly rely on lip-reading, cannot achieve this. 27 Computer-Based Speech Therapy Using Visual Feedback 2.2.5. Age of onset and speech quality The age of onset seems to be very important for the kind of speech a severely and profoundly hearing-impaired person produces. Many studies have shown that speech production will differ both in severity and type due to the age of onset (Zimmerman and Rettaliata, 1981; Plant and Hammarberg, 1983; Öster 1988b; Cowie and Douglas Cowie, 1992; Fischer, 1995). If the hearing loss is acquired prelingually, that is before the critical age around the age of six (Binnie et al., 1982), before speech and language is acquired, the effects on speech production are great, both on segmental articulation and on prosody. If the hearing loss is acquired postlingually, when the most active period of speech and language development is ended, the loss of auditory control will not affect voice and speech immediately. Changes, when they do occur, tend to be acquired gradually. The reason for this is, according to Zimmermann & Rettaliata (1981), that the postlingually profoundly hearing-impaired speaker possesses memory traces of speech patterns and that these remain for a long time despite lack of auditory control. Post-lingual hearing impairments are far more common than pre-lingual impairments. Common treatments include hearing aids and learning lip reading. The explanation of the fact that the age of onset has an important effect on speech quality of severely and profoundly hearing-impaired children is that children who are born with a profound hearing loss have no adequate acoustic speech target to imitate and compare their own production with. These children very rarely spontaneously develop speech but use sign language as their first language for the purpose of communication. However, prelingually and profoundly hearing-impaired children might develop intelligible speech for the purpose of making statements through a structured training using the visibility of phonetic features, reading, tactile sensations of the therapist's face, throat and expiration air and, if possible, residual hearing. Other senses such as vision and tactile stimulation must replace the auditory feedback for control and self-correction, which are important factors during speech acquisition. The child must see the words spoken in the same relationship many times, just as the hearing child has to hear them many times before understanding comes. 28 Theoretical background and literature review The fact that prelingually and profoundly hearing-impaired children are unable to hear their own spontaneous vocalisations deprives them of motivation to babble and prevents them from expanding their repertoire of speech sounds. Mavilya (1972) found both qualitative and quantitative differences in spontaneous vocalisations between three profoundly hearing-impaired infants and one normally hearing infant over a period of 3 to 7 months. Most of the vocalisations of the profoundly hearing-impaired infants were vocalic. Consonantal sounds were rare and not as well articulated as those of the normally hearing infant. The effects of early hearing impairment on babbling were investigated by Kent et al., (1987). The study followed the phonetic development at 8, 12, and 15 months of identical twins differing in auditory function. One of the twins had a profound hearing loss bilaterally (H) and the other had normal hearing (N). The major results were that H rarely produced consonants other than alveolar and bilabial stops, that H at 24 months barely produced the range of syllables that N produced at 8 months, that H developed a restricted usage of vowel formant patterns, and that H developed an extremely variable fundamental frequency pattern and vocal fry. Late onset, less and less babbling during the second half-year of life or atypical patterns of canonical babbling may indicate a profound hearing impairment according to Stoel-Gammon and Otomo (1986). Babies that begin canonical babbling after 10 months can be suspected to suffer from a profound hearing impairment according to Oller (2000). However, Wallace et al. (2000) reported that some of twenty deaf or hard of hearing infants with very simple babble production between 5 and 13 months developed intelligible speech, whereas some of the children with more complex early babble production were not speaking intelligibly by 5-10 years of age. The result indicates that for children with a profound hearing loss, articulation skill might be more dependent on developing an effective motor feedback acquired either visually or through touch. 2.2.6. Speech quality of profoundly hearingimpaired children Over the years, a number of qualitative and quantitative studies have documented types of segmental and prosodic deviations which are 29 Computer-Based Speech Therapy Using Visual Feedback typical of the speech of prelingually and profoundly hearingimpaired children, e.g., Ling (1976), Calvert (1961), Calvert & Silverman (1975), Hochberg et al. (1983), (see review by Gold, 1980). Prosodic aspects have not been studied as comprehensively as segmental aspects and show a much wider range of individual differences. Early studies were for the most part case studies based on collected diary accounts. Later on, more extensive studies appeared like, for example, the classic study of Hudgins & Numbers (1942), which documented the speech of 192 deaf children ranging in age from 8 to 20 years. Vowels are extremely difficult to acquire for a profoundly hearing-impaired child because of lack of sufficient visual cues. Monsen (1976) found that the reduction of the space of vowel articulation was commonly due to a restricted range of the second formant, which is primarily responsible for the non-visible forward and backward movement of the tongue. This is in accordance with the result by Martony (1975), where spectrographic measurements of the frequency of the fundamental and the first, second, third, and fourth formants of Swedish children between 11 and 17 years with pure-tone averages (PTA) between 75-115 dBHL were analysed. He showed that jaw and lip positions generally were correct but tongue positions incorrect, which shows the impact of visibility on profoundly hearing-impaired children's speech acquisition. 2.2.7. Prerequisites for severely and profoundly hearing-impaired children to develop intelligible speech Severely and profoundly hearing-impaired children joining speech therapy should possess certain essential and important skills to develop intelligible speech. A screening and training (when possible) of the following conditions should be done before the therapy starts: • 30 Case history Factors like type of hearing impairment, the degree of severity, the age of onset, special education, when hearing aid was first fitted, use of hearing aid, deafness in the family, etc. have a strong influence on what kind of speech a child born with a profound hearing loss might develop. Theoretical background and literature review • • • Speech organ structure and function An oral-peripheral examination should be done to investigate possible functional disorders that could prevent speech development (Ling, 1976). Proprioception Children who are born with a profound hearing loss learn speech sounds through differentiation, involving the orosensory-motor patterns that his/her speech creates within the vocal tract (Ling, 1976). Therefore screening and training of proprioception should be done to test and train his/her ability to differentiate between sounds she or he makes. This could be done through productions of vocalisations that are differentiated through proprioception of specific orosensorymotor patterns by the child. Lots of rhythm, movement, babbling and balance activities are recommended. Imitation skills The hearing child uses vision to find out how to articulate speech sounds and then imitates these speech movements. Audition is used as a feedback channel for control and comparison of their output with their surrounding. Profoundly hearing-impaired children use vision as a substitute for audition and are missing the option of feedback and selfcontrol. Therefore imitation is a fundamentally important skill for profoundly hearing-impaired children to have possession of (Ling, 1976). According to Fletcher (1986, pp. 236) ….”subjects with a hearing deficit will have enhanced skill in using visual information during oral motor tasks.” Testing and training can be done by for example reversing the role of teacher and child and imitating speech gestures: put out/withdraw the tongue, sticking the tongue-tip up/down, rounded/spread lips etc. Explanations of how the sound is produced and where the articulators are could be done by sign-language if needed, shown through the use of a mirror, through articulation pictures, the therapist acting as a model or through tactile and visual feedback. 31 Computer-Based Speech Therapy Using Visual Feedback • Functional hearing The quality of the child’s residual hearing for speech perception and control of his own speech production, the functional hearing, might be tested and trained through speech discrimination training with spoken stimuli of phonetic contrasts using illustrations of appropriate minimal word pairs that the child has to point to. • Exteroception Exteroception, that is the child’s possibilities of deriving meaning from the sounds that others make and of interpreting sounds as meaningful, implicates the knowledge of the sound system of the target language (Ling, 1976). This awareness is necessary to develop intelligible speech and can be attained through functional hearing, visual or tactile feedback. It could be tested and trained through audio-visual stimuli of speech, speech gestures and environmental sounds. • Lip-reading and visual capacity Lip-reading skills and good vision is of course a prerequisite for children with profound hearing losses to be able to develop intelligible speech through effective and appropriate visual strategies. • Sign language skills A good knowledge of sign language is extremely important as the speech therapist gives instructions and explanations about articulation positions and types of visual feedback offered in computer-based speech therapy. 2.3. Need, aim, and possibilities of speech therapy 2.3.1. Moderate hearing-impaired children Children with a moderate hearing loss might develop speech spontaneously but the speech contains many deviations and is often delayed. The major factor is whether they can hear their own voice 32 Theoretical background and literature review or not. A hearing loss affects a child’s possibilities to perceive his or her own sounds as well as the speech of people surrounding him or her. With the help of an early hearing-aid fitting, a hearing-impaired child’s possibilities to develop speech through self-correction will increase a great deal. However, the child often needs help with some aspects such as fricatives, plosives, and pitch and stress patterns. The aim of speech therapy for hearing-impaired children, born with a moderate loss, is a speech-correction of already established deviations. A hearing-impaired child must also get help in interpreting auditory sensations as meaningful and in identifying the sound system of the target language. 2.3.2. Prelingually severe and profoundly hearing-impaired children Prelingually and profoundly hearing-impaired children who more or less lack auditory feedback, have great possibilities to control and self-correct their own speech production with the help of computerbased visual feedback. This is illustrated in Figure 2.6 by means of dashed arrow lines. See Figure 2.5 for a comparison with the hearing child who acquires speech and language spontaneously through a combination of hearing and vision. The purpose of computer-based speech training for severely and profoundly hearing-impaired children is that they should develop intelligible speech for the purpose of making statements and more seldom for the purpose of communication. They should learn a “survival speech” that makes it possible for them to give and to understand simple messages in shops, in the street, etc. Most profoundly hearing-impaired children only receive 30-60 minutes of individual training per week. No speech training or correction is practiced in the classroom during ordinary lessons. These limited opportunities for speech therapy have increased the demands for efficiency. In a speech-training program it is important to make the child aware of the manner and place of articulation as well as of contrasts between similarly produced speech sounds. 33 Computer-Based Speech Therapy Using Visual Feedback Figure 2.6. Assistance of computer-based visual feedback in profoundly hearing-impaired children’s speech development. Distinctive features like, for example, nasality and voicing are nonvisible to the child through speech reading and consequently difficult to produce correctly. A computer-assisted aid has capabilities to offer a child immediate and meaningful visual feedback of these contrasts. By this technique it might also be easier for the therapist to instruct and explain what is wrong and what is correct in the child’s production. The use of sign language for instruction and explanation may assist the speech therapist and help the child follow important steps during speech learning. The aim of speech therapy for profoundly hearing-impaired children is to assist them in their speech acquisition. To make their speech intelligible, to expand their existing articulation skills, and to facilitate linguistic use are the most important goals in speech therapy for profoundly hearing-impaired children. 34 Theoretical background and literature review 2.3.3. Adult second language learners Speech therapy for L2 learners aims at an almost perfect pronunciation of both segmental and prosodic production of a new language. To learn a new language as an adult person is far more difficult than as a child. The brain of a child has plasticity and new networks and synapses can be developed much faster than for adults (Ellegård, 1982). For an adult person already established associations from his mother tongue must be changed. The adult person wants to use the new language straight away and learns very quickly the vocabulary of the second language. However, to learn to pronounce the words correctly requires lots of patience and a great amount of fundamental training of both perception and production of new contrasts and of contrasts that interfere with the sound system of the native language. A child learns the pronunciation of the words of the second language at the same time as he learns to utilize them and consequently gets the required fundamental training that is needed (Ellegård, 1982). Furthermore, a child has fewer demands upon the foreign language than an adult person has because the child talks about “here and now” and does not make use of the language abstractly as an adult needs to do. To sum up, children who learn a second language are able to add new patterns of behaviour to already established motor patterns while adult L2 learners have fewer possibilities to develop new connections to already established patterns. 2.3.4. Speech therapy aims Clients with deviant speech need speech therapy for different reasons. For moderate hearing-impaired children the goal is to improve articulation of especially the sibilants, to improve awareness, flow and consistency. However, the purpose of speech training for profoundly hearing-impaired children is that they should expand their systems and develop intelligible speech for the purpose of making statements and more seldom for the purpose of communication. As explained in section 1.4.1 they should learn a “survival speech” that makes it possible for them to give and to understand simple messages in shops, in the street, etc. The ambition for L2 speakers is quite the opposite. Speech therapy aims at helping them sound as native as possible. 35 Computer-Based Speech Therapy Using Visual Feedback 2.3.5. General steps in an individual therapy An individual and efficient speech therapy may vary somewhat from teacher to teacher but there are certain general steps that must be followed, irrespective of age or the client’s need for speech training (Risberg, 1968; Mahshie, 1995; Öster 1989b, 1989c, 1995b, 1995d, 1996). The following steps are recommended: • Diagnosis. The first step of speech therapy involves an individual assessment of the deviations that should be corrected and trained in order to increase the intelligibility of the speech or, as for L2 learners, to make the speech sound as native as possible. All assessment is time-consuming and laborious. However, to prevent speech therapy from becoming a series of meaningless "articulatory gymnastic" sessions, the therapist must be prepared to spend time carrying out an appropriate diagnosis prior to therapy. Therapy and assessment are inseparable as assessments are required regularly during the training program. • Instructions that aim at awareness, correct realisation, and understanding. It is important to make the learner aware of in what way his/her production deviates and show him/her how to correct it. • Initial training to obtain automaticity. When the learner has acquired the target production, significant amounts of training must be invested to get him or her maintain correct production. • Additional and repetitive training for generalisation and transfer to linguistic use. This is the most important element in a speech training program but the most difficult to carry out. The target production must be repeated and practiced in a variety of contexts. The ultimate goal is a system expansion that is accomplished when the learner’s best production becomes his or her most common production. 36 Development of speech technology in speech therapy… 3. Development of speech technology in speech therapy for severely and profoundly hearing-impaired children 3.1. Introduction Since the early 1960’s many training devices have been developed for profoundly hearing-impaired children, who learn a motor pattern for each speech sound instead of the acoustic output that normalhearing children do through audition. Early visual aids presented acoustic information of specific speech sounds visually in real time, like isolated vowels and nasals as well as the s-sound (Pickett and Constam, 1968; Pronovost et al., 1968; Cohen, 1968; Risberg, 1968; Upton, 1968, Thomas and Snell, 1970). Many of these visual displays showed either too much or too little information, and in most cases considerable cognitive processing was needed to identify a speech sound, according to Thomas and Snell (1970). In addition, the visual feedback in these early displays was often too technical, illogical and difficult to understand for a child. Moreover, the child got no information about important improvements as the feedback mostly showed whether the pronunciation was correct or incorrect. Nowadays, efficient computer-assisted speech therapy makes use of logical and amusing visual feedback that is easy to understand for the child. It offers a child the possibility to perceive non-visible speech articulation, to imitate and compare the vocal output with that of the therapist and to display the child’s smallest improvement. Beyond training on the phonetic level, extensive training on the phonological level is possible to help the child to make use of his or her oral motor pattern in speech. In the following sections a survey is given of some essential milestones in the development of technical aids for speech therapy for severely and profoundly hearing-impaired children, from early non-computerized aids to more sophisticated computer-aided systems. 37 Computer-Based Speech Therapy Using Visual Feedback 3.2. Process and product-oriented therapy systems Traditional speech therapy with profoundly hearing-impaired children is based on direct imitation methods that help the children learn speech on a phonetic level by looking at the therapist’s face and lips, through residual hearing or by feeling the therapist’s face, throat and expiration air. The speech therapist gives instruction on how to use the speech organs while forming sounds. This is called a processoriented approach, (Povel and Arends, 1991). Process-oriented means that such a system gives instructions on how to use the speech organs while forming various speech sounds by showing the placement of the articulators from outside and/or the inside of the mouth. A process-oriented system may replace the therapist in some stages of the training, especially in the instruction phase to guide the child how to move the articulators to improve the articulation. In contrast, most of the existing computer-based speech therapy systems are product-oriented at present, as they give acoustic information (spectral or temporal analysis) about the speech product and the final result. These systems based on acoustic information do not give information about how the sound may be articulated. They offer an interactive parametrical feedback by showing various visual representations of acoustical parameters in real time. A target speech model is usually created to make it possible for the child to evaluate his/her own speech compared to a target of accepted production. Simple and game-like visualisations as well as “speech pictures” are also used to draw the child’s attention to important parameters and help the child recognise whether his/her pronunciation is improving or not. A product-oriented computer-based speech therapy system principally requires the presence of a teacher for supervision and guidance of the adjustments of the articulators that are needed and to interact with the child by sign language if required. 3.3. Tactile aids Speech can be presented through vibrators at the fingertips and other parts of the body to indicate various elements of speech (Plant 1960; 38 Development of speech technology in speech therapy… Spens 1984; Traunmüller 1980). Many devices ranging from small portable aids to more sophisticated non-portable multichannel systems have been developed for awareness of environmental sounds as well as for speech reception. A comparison of different tactile communication aids was given by Spens (1984), see Figure 3.1. Figure 3.1. Different tactile aids with their respective places of stimulation and originators, Spens, 1984. Tactile aids can be classified in terms of how they produce touch sensations, as either vibrotactile, with signals delivered by one or more vibrators on the skin, or electrotactile, with sensations produced by electrical stimulation of the nerves that lead from the touch receptors into the skin. Tactile aids have been shown to be good communication aids in supporting lip-reading as they provide information on rhythm, intensity and segmental boundaries (Pickett, 1980; Spens, 1984). They can help to increase the sensation of the rhythm of music and the monitoring of a hearing-impaired person’s own voice for loudness and pitch. Research has been reported by Schulte (1972) and Proctor (1995) concerning tactile aids and speech therapy. However, the use 39 Computer-Based Speech Therapy Using Visual Feedback of tactile aids has been shown to be less successful due to a major discrepancy in frequency range between the hearing sense and the tactile sense (von Békésy, 1959). The possibility to detect vibrations on the skin is highest in the area of 250 Hz and is almost none above 1000 Hz. The most sensitive area for hearing is found in higher and greater frequency range that is in the same area where most of the important parts of speech are conveyed, (see section 2.2.2, the Speech Banana). However, this was compensated for by using several vibrators each covering different ranges of the speech spectrum as can be seen in Figure 3.1. Tactaid VII that is available on the market presents sounds via seven vibrators providing information on voicevoiceless sounds, intensity, temporal cues, and the first two formants. This information is transmitted through the vibrators through place, movement, strength, and duration of vibration. (http://www.tactaid.com/tactaid71.html, May, 2006). 3.4. Product-oriented visual aids The visual modality has been the most preferred feedback method used in speech training with hearing-impaired children. The main motivating force for developing visual aids was a wish to display motor gestures that were non-visible and to support hearingimpaired children in developing a proprioceptive control of their speech gestures. These early visual aids were all based on some extracted acoustic information presented one by one in each display. This fact rendered the therapy nonflexible as the teacher had to change instrument according to the different training needs of the child. Unfortunately, most of these visual displays were seldom evaluated within a pedagogical programme. Despite the fact that many of these aids have been reported to improve the speech of some children, the use of them was limited. This was probably due to the fact that the visual feedback provided by this type of visual aids was • difficult to understand • unnatural • delayed • unattractive • had no motivational impact on the children 40 Development of speech technology in speech therapy… Furthermore, due to technical problems and the lack of pedagogical manuals and evaluations, the teachers were not motivated to use them as standard procedures. 3.4.1. Feature based visual indicators Three different indicators were designed and developed during the 1960´s at the Speech Transmission Laboratory at KTH, Dept of Speech, Music and Hearing. Various speech-analysing techniques were integrated in three different indicators, giving the possibilities to train s-sounds, nasalization, intonation, pitch and rhythm. A contact microphone picked up the vibration of the vocal folds and the nasal resonance. The measured frequency value of a child’s production was indicated by a special lamp, showing a green light for a correct value and a red light for an incorrect value, or by the deflection of a meter needle along a frequency scale (Risberg, 1968; Martony, 1971). This type of visual feedback told the child whether the production was correct or not but gave no navigational information or explanation to the child of what to do to obtain an improved production. Figure 3.2. Intonation indicator with instrument display. 41 Computer-Based Speech Therapy Using Visual Feedback According to Risberg (1968) this type of visual aid was most useful in the instruction phase of speech therapy to describe and define the task. A promising result of voice register training was obtained by Martony (1966) with the intonation indicator (see Figure 3.2) in combination with an oscilloscope display. 3.4.2. Vowel and fricative displays A visual spectrum indicator (LUCIA) was also built by this group and was used for vowel and fricative training with hearing-impaired children at the Manilla Deaf School in Stockholm. The instrument used 20 band-pass filters in the frequency range of 200-7000 Hz. A matrix of 20 x 10 ordinary incandescent lamps gave light in 10 steps on the amplitude scale, each step about 3 dB, totally 30 dB, for showing important spectral details of long Swedish vowels, (see Figure 3.3). An innovation in this instrument was built-in memory that gave opportunities to study the attempt of the child in detail in order to give further instructions about correct behaviour. Figure 3.3. LUCIA with spectra of Swedish long vowels. 42 Development of speech technology in speech therapy… 3.4.3. Formant displays Displays showing the relationship between formants were developed by Thomas and Snell in 1968, Pickett and Constam (1968) and Watanabe et al (1985). With these devices different vowels could be trained by matching visual patterns. The device by Thomas and Snell, shown in Figure 3.4, used an F1 - F2 plot in real time. The location of a vowel on a F1 – F2 plot is related to the articulatory configuration required to produce the vowel in the following way: the tongue height is related to the frequency of the first formant in such a way that F1 in a close vowel like /i/ has a lower frequency than F1 in an open vowel like (a). The position of the tongue along the vocal tract is related to the frequency of the second formant in such a way that F2 in a front vowel like (i)/ has a higher frequency than F2 in a back vowel like (o) (Miller, 1951). The expectation was that the profoundly hearing-impaired child might perceptually associate the position of different vowels on the F1 – F2 plot with correct articulatory configuration. Figure 3.4. Visual display patterns of sustained vowels (Thomas and Snell, 1970). 43 Computer-Based Speech Therapy Using Visual Feedback 3.4.4. Fundamental Frequency displays Several displays were developed based on a microphone, fundamental frequency extractor and oscilloscope to improve voice pitch level and the fundamental frequency variation produced by profoundly hearing-impaired children (Plant, 1960; Martony, 1968; Boothroyd, 1970). A training system for hearing-impaired persons as well as for L2 learners was the “Laryngograph” (Fourcin & Abberton, 1971). This system was used quite successfully. The visualised signal gave a representation of the fundamental frequency and its variation shown as a frequency/time plot. Important information about prosody of a language as well as voice quality was provided. 3.4.5. Computer-based speech therapy systems with visual feedback A significant contribution to the next generation of visual aids for speech therapy for hearing-impaired children was made by Nickerson & Stevens (1973) when they developed the first computerbased speech therapy system. The main reasons for introducing computers in speech therapy for hearing-impaired children at that time were that computers could perform complicated transformations of the speech signal to match the needs of presentation, that computers were on the whole simple to use and that they were easier to modify than machines (Nickerson et al., 1976). Thanks to the microcomputer technological development, effective visual computer-based systems have been developed and currently computer-aided speech therapy is in common use in clinics, schools and nursery schools. The pioneering product-oriented computer-based system, made by Nickerson & Stevens in 1973, contained two motivating and playful displays. One was the “basketball game”, shown in Figure 3.5, where the aim was to get the ball into the basket avoiding the wall by controlling voice pitch. The height of the ball followed the fundamental frequency variation of the child’s voice. A happy or a sad face showed the result of the effort. This was one of the first amusing attempts to offer a child both rewarding feedback and an 44 Development of speech technology in speech therapy… exactly evaluative feedback, as is discussed in section 8.2.2 “type of feedback”. Figure 3.5. The “basketball game”. By controlling his/her vocal pitch movements the hearing-impaired child should pass a ball into the basket and avoid hitting the wall. The upper part shows a successful trial and the lower part shows a failure (Nickerson and Stevens, 1973). The other display, shown in Figure 3.6, was the “cartoon face”, where several parameters were shown at the same time. A voiced sound was shown by an “Adam’s apple” on the throat, its height represented fundamental frequency and the loudness was shown by the size of the mouth. Figure 3.6. The “cartoon face“ showing voicing by the “Adam’s apple”, pitch by its height and loudness by the size of the mouth (Nickerson and Steven, 1973) 45 Computer-Based Speech Therapy Using Visual Feedback Since then more advanced computer-aided speech training programs have been offered that have enhanced the possibility for speech and hearing impaired persons to improve their pronunciation (Osberger et al., 1981; Watson, Reed, Kewley-Port, Maki, 1989; Arends et al., 1991; Yamada & Murata, 1991; Levitt, 1993; Javkin, 1994; Rooney et al., 1994; Öster 1996, Öster et al. 1999b, 2002b, 2003). Most of these systems contain a microphone, an amplifier, and a speaker connected to a sound-card that allows the user to input, store and analyse speech and then display it and play it back. The software often contains several interactive programs that have been shown to be successful in assisting speech and hearing-impaired children in achieving awareness and control over various speech attributes such as voicing, timing, pitch, and loudness as well as refining articulation and prosody. An example of these advanced computer-based systems is The Visual Speech Apparatus (VSA), developed by Povel and Arends in 1990, (Arends et al., 1991; Arends, 1993). It was designed to be used by a speech therapist who would adapt the training for each child. The program consists of attractive games which are controlled by different aspects of the child’s speech. Figure 3.7 shows two selected exercises from this system. In the left panel three different ranges of loudness are shown for training of sustained voiced or unvoiced sounds. Voiced sounds are presented in red and unvoiced sounds in grey. To the right a training panel for vowel quality is presented. Figure 3.7. Two exercises from the Visual Speech Apparatus showing a panel for loudness training within certain ranges to the left and a vowel corrector display to the right, Arends et al., 1991. 46 Development of speech technology in speech therapy… In this display vowels are shown as points in a panel corresponding to the first and second formant. The child’s attempt is shown on the panel by a star. The intention was that the child should internalise the vowel space and achieve a correct production by changing the tongue position horizontally and vertically. In the Matsushita speech training system (Yamada & Murata, 1991), also called the CISTA aid (Computer Integrated Speech Training Aid), as many as ten speech parameters, physiological (process-oriented) and acoustical parameters (product-oriented), are displayed simultaneously in real time using five sensors. Figure 3.8. The Matsushita speech training system (CISTA) showing the five sensors. The sensors are a nose contact microphone, an airflow sensor, a tongue palate, a neck contact microphone and an air microphone for extracting the following training parameters: nasality, contact pattern of the tongue, expiration airflow, plosiveness, fricativeness, intensity, pitch, intonation and voicing. However, such a sophisticated system with a range of training parameters, technically advanced acoustic and articulatory models, video games and graphical interfaces might most probably cause frustration as it makes great demands upon the therapist. “User-friendliness is a key factor in determining whether a particular system will actually be used in school settings” according to Youdelman (1994, pp. 77). Figure 3.8 shows the system with all sensors. 47 Computer-Based Speech Therapy Using Visual Feedback A very important contribution to the development of computer-based speech therapy with hearing-impaired children as young as four years was the approach that was implemented in the Indiana Speech Training Aid (ISTRA) by a new technology for speaker-dependent word recognition (Kewley-Port et al., 1991). The technique applied templates from a child’s best production for comparison and evaluation of his/her speech quality in speech drills in game-like interfaces. The system gives feedback as a “goodness measure” based on the distance metric of the recogniser as an alternative for a human teacher. (Watson, Reed, Kewley-Port & Maki, 1989). A very efficient and widely used system is the SpeechViewer (Crepy et al., 1983; Crepy et al., 1986) that was developed in a joint research project in 1979 between the IBM France Scientific Centre and professionals of speech therapy for profoundly hearing-impaired children from Institut National de Jeunes Sourds de Paris and Centre Experimental Orthophonique et Pedagogiquein. Today the third version is still in use and the program has been translated and adapted to many different languages. The Swedish version of SpeechViewer III (translated and adapted to Swedish by the author) is presented and discussed in Chapter 8 together with two other speech therapy programs that have been developed within two European projects with KTH as one of the partners. The two other systems are “Box of Tricks”, developed in the SPECO-project: A Multimedia Multilingual Teaching and Training System for Speech Handicapped Children (Öster et al., 1999b; Vicsi et al., 1999, 2000) and the OLPy therapy system, developed in the Ortho-Logo-Paediaproject, (Öster et al., 2002b, 2003). 3.5. Process-oriented therapy systems with visual feedback Many computer-based speech therapy systems used today very often include a short section of illustrations showing cross-sections of place and manner of articulation during correct pronunciation of all speech sounds of the target language. These cross-sections can be studied and imitated by the child. Figure 3.9 shows articulation pictures of 48 Development of speech technology in speech therapy… the Swedish s-sound included in Trollerilådan, the Swedish version of Box of Tricks, developed in the SPECO-project. 2 1 3 Figure 3.9. Articulation pictures of the Swedish s-sound included in Trollerilådan, the Swedish version of Box of Tricks, developed in the SPECO-project. One of the earliest process-oriented technical aids used with deaf children was the Danish Talemat (the Speech Mate) developed by the Institute for Electronic Systems at Aalborg University Centre (AUC) and Aalborgskolen. The aid was not computerised but used a colour-TV and a special keyboard to show cross sections of the location of the articulatory organs during various speech sounds. Visualizations of various articulatory movements were also possible to study by rapid shifts of the schematic cross sections. The Speech Mate consisted of 95 cross sections of isolated Danish vowels and consonants as well as CV-syllables. The program was later made available for work on a PC and was called “PC Talemat” (Lindberg, 1992) and was manufactured by DanVoice in Denmark. Another process-oriented computer based speech training program called SIM (Speech Illumina Mentor) was developed that provided dynamic visual information of stored images of internal and external articulators created from a series of magnetic resonance 49 Computer-Based Speech Therapy Using Visual Feedback images (MRI). Syllables could be trained through a voice recognition system in a game format (Soleymani et al., 1997). 3.5.1. Physiological devices A number of devices have been developed which provide direct feedback from physiological sensors. These devices made the child aware of articulatory gestures in the early stages of speech training. The Pneumotachograph (PTG) (Mahshie, Herbert and Hasegawa, 1984; Mahshie and Yadav, 1990) provided aerodynamic feedback obtained from the oral and nasal airflow, oral air pressure, from an electroglottograph, and accelerometer signals. Case studies showed that both significant improvements and notable carry-over (generalizations) were observed, (Mahshie, 1995). The Electropalatograph (EPG) (Fletcher et al., 1991) used a sensor to show the tongue and palate contact during speech. An acrylic palate with embedded electrodes registered points of contact on the palate, which were displayed on a computer screen. A similar system was also included in the CISTA-aid (page 126) called the Palatograph, which provided real time dynamic tongue-palate contact by means of a tongue position sensor, see Figure 3.10. Figure 3.10. The contact pattern for /s/ of two speakers. To the left a teacher’s correct pattern is shown and to the right a child’s incorrect tongue contact pattern is shown, (Yamada & Murata, 1991). Contacts between the tongue and inner parts of the oral cavity during the articulation of different phonemes are shown on a screen as dots, which correspond to the individual sensors. The aim of the child is to copy the correct pattern of the therapist. This enables the student to see those speech sounds that are not visible when lipreading (Yamada & Murata, 1991; Youdelman, 1991). 50 Development of speech technology in speech therapy… 3.5.2. Automatic speech tutors More recent research aims at giving knowledge of the performance and processes during articulation. A talking head has been incorporated as a virtual speech therapist, called Baldi (Massaro & Light, 2004), who is able to show the movements of his internal articulators. Baldi has been used in the early stages of speech training giving instructions to hearing-impaired children and L2 learners about how speech sounds should be produced. However, no comparison of the visual feedback with reference to the client’s deviant production has been made with the help of Baldi. An audio-visual 3D facial model, a talking head, has been developed based on the KTH text-to-speech system (Beskow, 2003), (see Figure 3.11). Besides being used in applications like spoken human-machine dialogue systems, entertaining games, virtual reality and films, etc. a talking head might function in various communication aids for hearing-impaired users like the talking head telephone SYNFACE (Beskow et al., 2004), in language learning and pronunciation training. The advantages of using an automatic tutor in language learning are that it facilitates learning in a dialogue context and provides an always available conversational partner. Turn-taking and non-verbal feedback like encouragement, affirmation, and confirmation can be indicated as well as emphasis and focus in utterances. Figure 3.11. The talking face, and the underlying structure of the facemodel, (Beskow, 2003). 51 Computer-Based Speech Therapy Using Visual Feedback The assistance of an automatic tutor in pronunciation training with L2 learners and speech and hearing-impaired children could be helpful as it provides an untiring model of pronunciation, gives visual feedback for increased awareness, gives possibilities to display internal articulations, can exaggerate difficult sounds and highlight critical organs (Beskow et al., 2000). Recently a virtual speech tutor “the ARticulation TUtoR ARTUR” was developed at KTH in Sweden (Engwall et al., 2004; Engwall et al., 2006; Eriksson et al. 2005; Bälter et al., 2005), see Figure 3.12. This three-dimensional animated computer face with visible internal parts of the mouth shows the important differences between a correct articulation made by the three-dimensional computer face and the child’s production. It is meant to be a selfinstructive device in the later stages of speech training to obtain automaticity and transfer to untrained material. With the help of computer games and the feedback from ARTUR the intention is to help the therapist with the repetitive and additional training that is necessary to establish sensory-motor associations. Figure 3.12. A display of the articulation tutor ARTUR giving articulatory visual feedback of the tongue position during the production of the voiceless fricative consonant /s/, (Engwall et al., 2006). 52 Development of speech technology in speech therapy… The target production can be repeated and practiced in a variety of contexts. The tutor shows the child how to move the tongue, jaw and lips to improve his/her articulation. In this way the child is able to train alone without the guidance of a speech therapist. 3.6. Spoken language training for L2 speakers Pronunciation training has often been neglected in the teaching of Swedish as a second language. However, nowadays the importance of an acceptable pronunciation by Swedish immigrants is stressed and the uses of efficient training methods are accentuated. Traditionally spoken language has been practised in language laboratories by “repeat-after-me” methods. This type of training provides the learner with a delayed, auditory and verbal feedback from a teacher. Speech-interactive language learning systems within the field of Computer Assisted Language Learning (CALL) focus on selftutoring systems with automatic error detection for the correction of the articulation of L2 speakers (Eskenazi, 1996). Individual training with the guidance of a teacher is impossible due to big classes and little time. With the help of ASR techniques (Dalby and Kewley-Port, 1999), automatic pronunciation scoring, distance learning and CDROMs with pre-recorded utterances (LaRocca, 1994), home training is now possible. The use of visual information as an additional feedback to the auditory information has been used in the Swedish spoken L2training by Gårding & Bannert (1979), McAllister (1986), and Flege (1989) among others. Today two programs for Swedish are available that give possibilities to listen to pre-recorded speech in the form of dialogues. The intention is that the learner should pronounce the text and then compare a graphical curve of his pronunciation with the pre-recorded speech (Kjellin, 1997; LINGUS, Larson Education AB). However, computer-based pronunciation training with audio-visual contrastive feedback has been used very rarely although it has been shown to provide a valuable resource in spoken L2-teaching by Öster (1997, 1998). 53 Residual hearing for speech processing … 4. Residual hearing for speech processing - methods of investigating the functional hearing for speech 4.1. Introduction How is the speech intelligibility of profoundly hearing-impaired children related to audiological data? Is there some measure of their residual hearing that could predict a child's possibility to develop intelligible speech? Studies reported in this chapter indicate that a profoundly hearing-impaired child’s speech intelligibility is mostly related to the quality and the use the child has been able to make of his residual hearing, measured through the ability to recognize simple speech stimuli. The development of a computer-based analytical speech perception test for diagnostic purposes of small children from four years of age is also described in the final part of this chapter. The perception test contains easy and familiar illustrated word pairs of minimal contrasts. As has been stated in section 2.2.2, it will be insufficient to rely on pure-tone audiometry when estimating a profoundly hearingimpaired child's possibilities of developing intelligible speech, as it gives no information about the child’s speech processing capabilities. Speech is made up of complex and rapidly changing acoustic events (Ling, 1976) and the pure tone audiogram cannot indicate a profoundly hearing-impaired child’s ability to distinguish between frequencies, to track formant transitions or to detect differences in intensity. The ability to perceive even simple speech material may not correlate with the ability to hear pure tones. For children with pure-tone averages worse than 90 dBHL, sound might be perceived through vibrotactile rather than auditory receptors. Vibrotactile perception is mostly limited to speech-envelope features like duration and intensity (Erber, 1974a). Auditory perception discriminates also spectral features like small differences in fundamental frequency and vowel formant patterns. The pure tone audiogram will not differentiate "vibrotactile" from "auditory" 55 Computer-Based Speech Therapy Using Visual Feedback children as it provides insufficient information about speechprocessing capabilities, like the ability to perceive gap durations and small differences both in frequency and intensity. Using the pure tone audiogram to predict a child’s speech perception abilities is inappropriate because it does not reflect a child’s capacity to perceive complex sounds like speech. According to Osberger (1992) individuals with similar hearing sensitivity may demonstrate very different speech reception abilities. The sort of speech a profoundly hearing-impaired child develops depends not only on the amount of hearing but also on the quality and the use the child is able to make of his/her residual hearing, the so called functional hearing for speech. In order to investigate the relation between the speech intelligibility of eleven profoundly hearing-impaired children and their residual hearing capabilities a study was carried out (Öster 2002a) that contained three parts. In the first part of the study, the relation between the amount of residual hearing, measured as the better-ear average of pure-tone thresholds at 500, 1000 and 2000 Hz, and the intelligibility of their speech was investigated. In the second part, the interrelationship between the quality of residual hearing (defined as the shape of their audiogram) and the intelligibility of their speech was studied. In the third part, the relation between the degree to which the children could use their residual hearing for speech (the functional hearing), and the intelligibility of their speech was studied. 4.2. Functional hearing The term functional hearing is used to describe the degree to which a child can use his residual hearing for speech perception and control of his own speech production. The quality of the residual hearing is crucial for a child's ability to perceive speech and to develop intelligible speech and depends on many factors such as amount of hearing aid use, amount of auditory training, discrimination ability of speech features, learned ability to identify speech sounds, phonological short-term memory and speech processing capabilities. To some degree functional hearing can be trained, as some factors, for example phonetic and linguistic interpretation, might be influenced by learned ability to identify the speech sounds of the 56 Residual hearing for speech processing … target language. The hearing capacity on the other hand is physiological and cannot be trained. Consequently, the residual hearing capabilities for speech (the functional hearing) of a profoundly hearing-impaired child depend on the following factors: • • • • • • amount of hearing aid use amount of auditory training discrimination ability of time and frequency (hearing acuity) phonetic interpretation capacity (learned ability to identify speech sounds) speech processing capability (to combine fragmentary information to a whole meaningful message, including phonological short-term memory) linguistic interpretation capacity (to understand running speech and simple questions with or without speech reading) 4.3. Functional hearing and speech intelligibility 4.3.1. Subjects Eleven children from the Manilla School for the Deaf, five boys and six girls, were selected to cover the range from good to poor speakers. They were not chosen according to degree of their hearing loss. All of the subjects had been trained with traditional speech therapy. Their pure tone averages (PTA) at 0.5, 1, 2 kHz were between 90-108 dBHL in the better ear. One child was eleven years of age, while the others ranged from fifteen to seventeen years. Age, sex, mean hearing loss and shape of audiogram (cf. Figure 2.3) are shown in Table 4.1. The children were educated by sign language and their hearing losses were in the vicinity of 90 dBHL or more, areas where the hearing capabilities may rather be vibrotactile than auditory. Their attitudes towards speech therapy as well as their speech intelligibility skills varied from very poor to very good. 57 Computer-Based Speech Therapy Using Visual Feedback Table 4.1. Age, sex, pure-tone averages (PTA), and shape of audiograms for the frequencies 0.5, 1, 2 kHz in the better ear for the eleven children in the study. Child 1 2 3 4 5 6 7 8 9 10 11 4.3.2. Age 15 16 11 16 16 15 16 16 16 16 17 Sex F F F F F M F M M M M PTA dBHL Shape of audiogram 90 B3 92 B4 92 B4 95 C3 95 C3 98 C4 100 C4 103 C4 108 C5 108 C6 108 C6 Intelligibility test The speech of the children was recorded on audio-tape, as they read some questions. The questions were so-called Helen-questions (Ewertsen, 1973) of the type “what colour is a lemon?”, which can be answered with one word only. Five different questions read by each child were presented via headphones at a comfortable level to two groups of normally hearing persons (experienced and inexperienced in listening to the speech of profoundly hearing-impaired children) to evaluate the speech intelligibility. The listeners had the possibility of repetition. The listener's task was to write down, in Swedish orthography, the answer to the question with one word. If the answer was correct, the question is counted as being correctly understood. Only completely correctly understood questions were counted as correct. 4.3.3. Listeners Twenty-four normally hearing persons listened to the speechmaterial. Sixteen of the listeners belonged to the staff of the department of Speech, Music and Hearing, KTH and were inexperienced in listening to the speech of profoundly hearingimpaired children. However, most of them were phonetically trained and had some experience of pathological speech. Eight of the 58 Residual hearing for speech processing … listeners were speech therapists of profoundly-hearing-impaired children and hence experienced in listening to this kind of speech. 4.4. Results 4.4.1. Speech intelligibility scores of 11 profoundly hearing-impaired children Figure 4.1 shows the results of the intelligibility test averaged across all 24 normal hearing listeners. measured speech intelligibility, % 100 90 80 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 11 profoundly hearing-impaired children Figure 4.1 Speech intelligibility scores of 11 profoundly hearing-impaired children. The figure shows number of correctly perceived Helen-questions in percent by 24 normal-hearing listeners. The figure shows number of correctly perceived questions in percent by the listeners (each bar is based on 120 responses). The speech of three of the children was unintelligible, as the listeners could only understand 7-34% of their questions. Two of the children were semiintelligible, 57-63% of their questions were understood. Six children were assessed to be intelligible as they were able to make as much as 74-98% of their questions understood by the listeners. Consequently, these profoundly hearing-impaired children with pure tone averages at .5, 1, and 2 kHz between 90-108 dBHL in the better ear showed 59 Computer-Based Speech Therapy Using Visual Feedback vastly varying intelligibility scores. The result indicates that the eleven children covered the range from good to poor speakers, thus confirming the selection of speakers made in this study. 4.4.2. The effect of listeners’ experience Figure 4.2 shows the speech intelligibility of each child across experienced and inexperienced listeners. The average result for all questions and all speakers was 60% for all listeners, 54% for inexperienced and 65% for experienced listeners. % measured speech intelligibility experienced listeners inexperienced listeners 100 90 80 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 profoundly hearing impaired children Figure 4.2. Speech intelligibility scores of 11 prelingually profoundly hearing-impaired children. The figure shows number of correctly perceived Helen-questions in percent by experienced and inexperienced listeners. The result shows that each child’s speech was understood as good, semi-intelligible and unintelligible by all listeners independent of experience. However, the experienced listeners understood each child better than the inexperienced listeners, except for child 1 and child 5, who were equally intelligible to both types of listeners. 60 Residual hearing for speech processing … 4.4.3. Relation between amount of residual hearing and speech intelligibility Speech intelligibility, % To investigate if the amount of the residual hearing also is predictive of the speech intelligibility of profoundly hearing-impaired children with pure-tone averages of 90-108 dBHL, the relation between the amount of residual hearing and the children's speech intelligibility was studied. The index most commonly used to indicate the amount of hearing of a hearing-impaired person is the better-ear average of pure-tone thresholds at 500, 1000 and 2000 Hz. r = 0.003 100 90 80 70 60 50 40 30 20 10 0 85 90 95 100 105 110 Pure tone averages, dBHL Figure 4.3. Correlation between the intelligibility of 11 profoundly hearingimpaired children’s speech and their mean hearing loss for the frequencies 500, 1000, and 2000 Hz. Figure 4.3, however, shows no correlation between speech intelligibility and PTA, which means that the intelligibility cannot be predicted from the degree of hearing loss, as measured by pure-tone audiometry. This is most likely because the pure tone audiogram gives insufficient information about speech processing capabilities. For this reason, a pure tone audiogram is not a good estimate of a profoundly hearing-impaired child's possibilities to develop intelligible speech. 61 Computer-Based Speech Therapy Using Visual Feedback 4.4.4. Relation between shape of audiogram and speech intelligibility It has been discussed whether or not the shape of the audiogram, rather than the degree of the hearing loss, may be the main factor which affects the intelligibility of hearing-impaired speakers. In general, flat audiograms are said to be associated with high speech intelligibility scores and falling audiograms with low speech intelligibility scores. Markides (1985) investigated the relationship between shape of audiogram and rated speech intelligibility of children with similar average hearing loss of about 50 dB in their better ear. Six groups of children represented different types of audiogram shapes. However, he found no significant correlation between audiogram shape and rated speech intelligibility in that group of hearing-impaired children. To investigate the relationship between the shape of the audiogram and the children’s speech intelligibility we used the method of Risberg & Martony (1970), shown before in Figure 2.3, to classify the children's pure tone audiograms. However, the shapes of all audiograms were falling, varying among the children from B3 to C6, see Figure 4.4. Six of the children's audiograms were in the areas C4-6, which means in ranges where sound might be perceived through vibrotactile rather than auditory receptors. However, no clear relation existed between speech intelligibility and quality of their residual hearing, classified according to Risberg & Martony. Except for one child (B3 with an intelligibility score of 98%), there was no indication that a child with a classification of C4-6 should be a poorer speaker than a child with a classification of B4-C3, see Figure 4.4 (cf. Figure 4.1). 62 % measured speech intelligibility by 24 normal-hearing listeners Residual hearing for speech processing … 100 90 80 70 60 50 40 30 20 10 0 B3 B4 B4 C3 C3 C4 C4 C4 C5 C6 C6 11profoundly hearing-impaired children Figure 4.4. Speech intelligibility and shape of audiogram classified by the method of Risberg & Martony (1970). 4.4.5. Relation between functional hearing and speech intelligibility The functional hearing of a child, that is the degree to which he can use his residual hearing for speech perception and control of his own speech production, depends on his ability to perceive temporal and spectral information in speech, like the ability to perceive gap durations and small differences both in frequency and intensity. Speech processing capabilities are of course more appropriate to measure by means of a speech-test than by pure tones. Since the range of speech reception skills in profoundly hearing-impaired children is quite limited and since they have little experience in using speech, speech material can be difficult to utilize. Sentences might contain words that they do not know or difficult grammatical constructions. Speech tests especially designed for this group should be the best to use. However, no computerised tests have so far been developed in Swedish for use with young children who have difficulties in perceiving and producing speech. Several researchers 63 Computer-Based Speech Therapy Using Visual Feedback Correctly perceived words, % have shown (Cramer & Erber, 1974; Erber, 1974a; Gustafsson, 1984) that a simple spondee recognition test, for want of something better, can give valuable information about a profoundly hearing-impaired child's ability to perceive speech. To investigate if the use of residual hearing could predict the intelligibility of profoundly hearing-impaired children's speech, the functional hearing of each child was measured by means of a speech test consisting of twelve common spondaic words (Risberg et al., 1977), familiar to the children. 100 90 80 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 11 prelingually deaf children Figure 4.5. The figure shows functional hearing of each child, that is the percentage of correctly perceived spondaic words of each child. Each bar represents 24 words. The words were presented twice (24 words) to the children via headphones at a comfortable level. The children answered by pointing to pictures that illustrated the test words. Figure 4.5 shows the functional hearing of the eleven children. Percentage of correctly perceived spondaic words is shown for each child. The relation between measured speech intelligibility (see section 4.4.1) and functional hearing of the children is shown in Figure 4.6. A positive correlation was calculated to 0.73 between functional hearing and speech intelligibility measured for all listeners on average. The correlation between functional hearing and speech 64 Residual hearing for speech processing … Speech intelligibility, % intelligibility scores for experienced listeners was 0.74 and for inexperienced listeners 0.70. This indicates that the result of a simple speech test is a moderately good predictor of a profoundly hearingimpaired child's ability to develop intelligible speech. r = 0.728 100 90 80 70 60 50 40 30 20 10 0 0 10 20 30 40 50 60 70 80 90 100 Correctly perceived spondaic words, % Figure 4.6. Correlation between the speech intelligibility of 11 prelingually profoundly hearing-impaired children and their possibilities to correctly perceive twenty-four spondaic words. The results of this study indicated that there is no clear correlation between increasing hearing loss and decreasing speech intelligibility for children with pure tone averages above 90 dBHL. As stated above it will be insufficient to rely on pure-tone audiometry when estimating a profoundly hearing-impaired child's possibilities to understand and develop intelligible speech, as it gives little reliable information about the child’s speech processing capabilities. The sort of speech a profoundly hearing-impaired child develops depends not only on the amount of hearing, as measured by puretone thresholds, but also on the quality of the hearing sensation and the use the child through training has been able to make of his residual hearing. The degree to which a profoundly hearing-impaired child can use his residual hearing for speech processing is of course only one of many factors affecting the child´s speech development. Levitt (1987) has shown that there also exists a strong relationship between speech intelligibility and age at onset of hearing loss, early special education, use of hearing aid and additional handicaps. 65 Computer-Based Speech Therapy Using Visual Feedback 4.5. Functional hearing and speech perception tests 4.5.1. Introduction There are several speech perception tests that can be used with prelingually and profoundly hearing-impaired children and children with specific language impairment to assess their speech processing capabilities: the GASP test (Erber, 1977), the Merklein Test (Merklein, 1981), Nelli (Holmberg and Sahlén, 2000) and the Maltby Speech Perception Test (Maltby, 2000). Results from these tests provide information concerning education and habilitation that supplements the pure tone audiogram and the articulation index, because it indicates a person’s ability to perceive and to discriminate between speech sounds. Martony et al. (Martony et al., 1972; Martony, 1974; Risberg, 1976) developed an analytical rhyme speech test to be used with profoundly hearing-impaired children for testing speech discrimination ability in four frequency ranges. The test method was based on acoustic differences between contrasting phonemes and was aimed to to be used in hearing aid fitting to predict the difficulties an individual child might have in learning to use acoustic information. However, the method required that the children were able to read. As no computerized analytical speech test based on illustrations so far had been developed in Swedish for use with young children who have difficulties in perceiving and producing speech, this was done in cooperation with Risberg & Dahlqvist, (Öster 2002a). The development of the computer-based analytical speech perception test was an effort to address the need for early diagnoses and supplementary information to the pure tone audiogram about speech perception skills. More importantly, a goal of this test was to measure the potential for children to produce intelligible speech given their difficulties to perceive and produce speech. Therefore, this test was based on both acoustic and articulatory differences between contrasting phonemes. The expectation was that the result of this test might give important recommendations for individual treatment and speech-training 66 Residual hearing for speech processing … programs. The test seeks to evaluate the ability to perceive a range of sound contrasts used in the Swedish language. The test is tailored for measurements with small children from four years of age, who have not yet learnt to read, by using easy speech stimuli, words selected on the basis of familiarity, and pictures that represent the test items unambiguously. Profoundly hearing-impaired children with pure tone averages worse than 90 dBHL show very different abilities to learn speech and their potential to develop intelligible speech is unrelated to their pure tone audiograms. The development of this test was an effort to find a screening tool that can predict the ability to develop intelligible speech. 4.5.2. Decisive factors for speech tests with small children The aim of an analytical speech perception test is to investigate how sensitive a child is to the differences in speech patterns that are used to define word meanings and sentence structures (Boothroyd, 1995). Consequently, it is important to use stimuli that represent those speech features that are phonologically important. Since the speech reception skills in profoundly hearing-impaired children are quite limited, and since small children in general have a restricted vocabulary and reading proficiency, the selection of the speech material was crucial. The words selected had to be familiar and meaningful to the child, be represented in pictorial form and contain the phonological contrasts of interest. However, presenting sound contrasts as nonsense syllables, so that the perception is not dependent on the child’s vocabulary, was not a solution. It has been shown that nonsense syllables tend to be difficult for children to respond to and that they often substitute the nearest word they know (Maltby, 2000). Other important factors to pay attention to were: • • • what order of difficulty of stimulus presentation is appropriate what are familiar words for children at different ages and with different hearing losses what is the most unambiguous way to illustrate the chosen test words 67 Computer-Based Speech Therapy Using Visual Feedback Moreover, the task had to be meaningful, natural and well understood by the child; otherwise he/she will not cooperate. Finally, the test must rapidly give a reliable result, as small children do not have particularly good attention and motivation. 4.5.3. Test construction The test contains illustrations of easy and familiar words. The words contain important phonological Swedish contrasts and each contrast is tested in one of eighteen different subtests by 6 word pairs presented twice. In Table 4.2 a summary of the test shows the phonological contrasts evaluated in each subtest, an explanation of the discrimination task and one example from each subtest. The words used were recordings of one female speaker. An illustrated word (the target) is presented to the child on a computer screen together with the female voice reading the word. The task of the child is to discriminate between two following sounds without illustrations and to decide which one is the same as the target word, see Figure 4.7. The child answers by pointing with the mouse or with his/her finger to one of the boxes on the screen. Figure 4.7. An example of the presentation of test stimuli on the computer screen. In this case the phonological contrast of vowel quantity tested through the words tiger-tigger [ti˘gEr-tIgEr] (tiger-begging). 68 Residual hearing for speech processing … Table 4.2. The eighteen subtests included in the test. 69 Computer-Based Speech Therapy Using Visual Feedback The results are presented in percent correct responses on each subtest showing a profile of a child’s functional hearing (see Figure 4.9) that is useful for the speech therapist for screening purposes and gives good indications of the child’s difficulties in perceiving and subsequently producing the sounds of the Swedish language. Figure 4.9 Example of a result profile for a child. Percent correct responses are shown for each subtest. 4.5.4. Preliminary results During the development of this analytical test a special reference group was established consisting of speech therapists for normal hearing children with special language impairment, moderately hearing-impaired children and profoundly hearing-impaired children. The therapists tried the different versions and gave 70 Residual hearing for speech processing … valuable advice continually concerning test procedure, way of response, type of interface, choice of illustrations, type of colours, and how to show the results etc. Altogether 54 children of different ages and with different types of difficulty in understanding and producing speech took part in the development and evaluation of the different versions. Eighteen of the normally hearing children with special language impairment were between 4 and 7 years of age and twelve were between 9 and 19 years of age. Nine of the children had a moderate hearing impairment and were between 4 and 6 years old and fifteen children had a profound hearing impairment and were between 6 and 12 years of age. Four of these had a cochlear implant and were between 6 and 12 years of age. Table 4.3 shows a summary of the children who tried out the program. Table 4.3. Description of the children who participated in the development of the test. Average of pure-tone hearing threshold levels at 500, 1000 and 2000 Hz), age and number of children are shown. Normal-hearing children with specific language impairment 4-7 years of 9-19 years of age age No. = 18 No. = 12 Hearing-impaired children < 60 dBHL > 60 dBHL 4-6 years of age 6-12 years of age No. = 9 No. = 15 Figure 4.10 shows profiles for all 24 hearing-impaired children on some of the subtests. Black bars show mean results for the whole group and striped bars show the profile of one child with 60 dBHL pure tone averages at 500, 1000 and 2000 Hz. The result indicates that the child has greater difficulties on the whole to perceive important acoustical differences between speech sounds than the mean result of the 24 children. Many of his results of the subtests were below the result for guessing (50%). The result might be a good type of screening for what the child needs to train in the speech clinic. 71 Computer-Based Speech Therapy Using Visual Feedback Number of syllables Gross discrimination of long vowels Vowels differing at low frequencies Vowels differing at high frequencies Vowel quantity Discrimination of voiced consonants Discrimination of voiceless consonants Manner of articulation Place of articulation Voicing Nasality 0 20 40 60 80 100 Figure 4.10. Results for the hearing-impaired children (N=24). Black bars show average results of the whole group and striped bars show the result of one child with 60 dBHL pure tone averages at 500, 1000 and 2000 Hz. 4.5.5. Conclusion The purpose of using analytical speech tests to get a measure of a listener’s ability to understand or produce speech is important for many applications, e.g. in developing technical aids for transmitting speech through telephone or radio communications, in considering the acoustics of public halls, in establishing the impact of a sudden hearing loss on the ability to perceive and produce speech and in developing individual technical aids for the hearing-impaired. The preliminary results reported here from the developmental phase of this speech perception test indicate that this type of a computerised speech test might give valuable information about which speech sound contrasts a hearing or speech disorded child has difficulties with. The child’s results of the different subtests, consisting of both acoustic and articulatory differences between contrasting sounds, form a useful basis as an individual diagnosis of 72 Residual hearing for speech processing … the child’s difficulties. This can be of great relevance for the work of the speech therapists. The intention is that this test should be normalised to various groups of children so the result of one child could be compared to group data. This is useful supplementary information to the pure tone audiogram, especially for children with profound hearing losses. Hopefully it will meet the long-felt need for such a test for early diagnostic purposes in recommending and designing pedagogical habilitation programs for small children with difficulties in perceiving and producing speech. A training part consisting of computerized game-like exercises is now in progress. The training material will be based on the child’s difficulties shown in the test result profile. 73 Effects of speech input limitations on speech production 5. Effects of speech input limitations on speech production 5.1. Deviations in the speech of moderate hearing-impaired children According to the theory discussed above in section 2.2.3, speech is acquired by the developing child through the visual channel for imitation of speech movements and through the hearing channel for auditory control and self-correction. Therefore it is not surprising that the speech of hearing-impaired children often suffers from distortions and lack of articulatory precision. However, these children develop spontaneous speech and have possibilities to measure their own production attempts through self-control and self-correction depending on the severity of their hearing loss and the benefit and use of hearing aids. Often they need some help with some aspects of their own speech, as for example the pronunciation of fricatives, correct pitch, and stress patterns. The hearing loss of these children might be considered as a single filter (not compensating for other disordered perceptual processes) and their deviant articulation may be predicted from the audiogram. If the loss is prominent in the higher frequencies for instance, fewer errors will be made on vowels than consonants. Front vowels that differ at high frequencies will then be more deviantly produced than central and back vowels that differ at low frequencies. The possibility to perceive and the ability to produce unvoiced highfrequency fricatives will be affected and the place of articulation of consonants will often be confused. Typical speech deviations of moderate hearing-impaired children can be classified as distortions, substitutions, omissions, and insertions. A distortion is a non-standard production. A substitution is when a standard phoneme replaces another phoneme. When a deviation is defined as an omission, a speech sound is not produced at all at a place where it should be. Finally, in the case of insertions, an improper addition of a speech sound is made. 75 Computer-Based Speech Therapy Using Visual Feedback 5.2. Deviations in the speech of profoundly hearing-impaired children The summary of common segmental, prosodic and voice disorders listed below is based on assessments made of the speech of Swedish prelingually and profoundly hearing-impaired children, (Öster 1992a). Vowels ¾ Simplified vowel systems incorrect tongue position but correct lip position ¾ Nasalized vowels Consonants ¾ Substitutions of manner of articulation stop consonants for fricatives and nasals nasals for fricatives and laterals laterals for tremulants fricatives for laterals ¾ Substitutions of place of articulation fronting of velars and palatals backing of dentals and palatals substitutions between sibilants ¾ Voicing devoicing of voiced stops and fricatives voicing of unvoiced stops and fricatives ¾ Deletions and weakening of consonants deletion of laryngeal /h/ in initial position absence of air-stream of fricatives deletions of stops non-audible release of final stops ¾ Insertions fricatives and nasals are followed by stops produced at the same place of articulation ¾ Lack of co-articulation two identical sounds in word-final and wordinitial position are not merged into one sound but pronounced as two sounds with a pause between 76 Effects of speech input limitations on speech production Rhythm ¾ ¾ ¾ ¾ ¾ ¾ Pausing errors pauses between words and/or syllables Slow tempo No vowel quantity differences preference for phonologically long vowels Lengthening of segments Incorrect stress patterns monotony Insertions a central vowel is often inserted between consonant clusters and after final consonants Pitch and Voice Quality ¾ Restricted frequency range ¾ Vowel dependent fundamental frequency ¾ High pitch ¾ Tensed voice ¾ Breathy voice ¾ Intensity variations Some of these deviations affect the meaning of words while others affect only the naturalness of the speech. All children showed several phonological deviations, some of which were shared by all subjects. However, the children differed as to the frequency with which they applied the processes and some processes occurred in the speech of only one or two children. Many of the deviations listed above can also be found in the simplified speech of young normally hearing children. However, while these deviations disappear when the normal-hearing child matures and grows up these deviations will become fixed in the speech of profoundly hearing-impaired children if the children stay away from speech therapy. 77 Computer-Based Speech Therapy Using Visual Feedback 5.2.1. Factors that cause deviations in the speech of profoundly hearing-impaired children Some of the deviations found in the list above are unique for children with a profound hearing loss due to the reliance on visibility, impact of orthography, impact of different teaching methods, etiological aspects, educational aspects, and lack of physiological control. Visibility The impact of visibility on the speech of profoundly hearingimpaired children was shown by Öster (1992a), see Table 5.1. Aspects that are difficult to lip-read were related to characteristic deviations in the speech of profoundly hearing-impaired children. Table 5.1. Relationship between non-visible speech elements and typical deviations in the speech of prelingually and profoundly hearing-impaired children, (Öster 1992a). NON-VISIBLE CUES: Consonants: Place of articulation Manner of articulation Voicing Control of velum Vowels: Tongue-position Quantity Prosody: F0 F0-variation Rhythm TYPICAL DEVIATIONS: Fronting and backing Stopping Voicing-errors Nasalization Vowel reductions Preference for long vowels High pitch Monotony Staccato-speech As part of a course requirement, 59 students in phonetics were exposed to a listening test, that illustrates the impact of visibility on a 17-year-old prelingually and profoundly hearing-impaired boy’s production of Swedish long vowels. His pure tone average was 92 dBHL. The students were requested to listen to the recorded material through a loudspeaker when the boy pronounced syllables made up 78 Effects of speech input limitations on speech production of /b/ in the connection with all Swedish long vowels. The students’ task was to identify how many of the nine Swedish long vowels the boy was able to differentiate between. Number of identified long vowels by 59 listeners number of listeners 25 20 15 10 5 0 1 2 3 4 5 6 7 8 9 number of vowels Figure 5.1. Number of long vowels identified by 59 normally hearing listeners. Figure 5.1 shows the result of 59 listeners’ answers. It is obvious that the boy has a relatively restricted vowel system. The majority of the listeners, twenty-four persons, could identify 5 different vowels while nineteen other persons could differentiate between 4 vowels. Three persons identified 2 vowels, four persons heard 3 different vowels and six heard 6 vowels while three persons were able to identify as many as 7 different vowels. The confusion matrix in Figure 5.2 shows that the vowels /O˘/ (57 out of 59), /E˘/ (55 out of 59) and /A˘/ (53 out of 59) were the vowels that the listeners identified the best. That means that the boy articulated these vowels very well. Most likely he controlled the rounding contrast between /O˘/ and /E˘/ thanks to the visibility of the difference in rounding. Moreover all these vowels are produced with a relaxed and flat tongue and a rather visible open jaw position in contrast to the vowels that the listeners had most difficulties to 79 Computer-Based Speech Therapy Using Visual Feedback identify; /i:/ (14 out of 59), /e:/(1 out of 59) and /u:/(7 out of 59), which have a non-visible closed tongue position. /O˘/ and /E˘/ were also those vowels that the listeners most frequently heard instead of other vowels. Figure 5.2. Identified long vowels by 59 normally hearing listeners. Produced vowels are shown vertically and identified vowels are shown horizontally. Bottom row shows number of times each vowel was identified (grey square) or heard instead of another vowel. Lack of physiological control The lack of physiological control during speech causes deviations like nasality, disordered voice production and above all disordered respiratory processes in speech, which affect the intelligibility of the speech of profoundly hearing-impaired children (Bench, 1992). Profoundly hearing-impaired children often have poor breath control during speech production. They use too much air per syllable, have a 80 Effects of speech input limitations on speech production slow speaking rate, make pauses between syllables that result in an erratic rhythm and they tend to breathe without consideration of linguistic boundaries. Their breathing often increases during speech compared to a decreasing breathing during speech by normally hearing children. Whitehead (1983) investigated the respiratory patterns during speech of ten deaf male adults with a pure tone average loss of 105 dB. Magnetometer coils were attached to the chest, one for the abdomen and one for the rib cage, and then fed to an oscilloscope. Recordings were made when each subject read "the Rainbow Passage" (Fairbanks, 1960) at normal loudness and when the subjects talked about a topic of interest. The result showed that some deaf speakers did not inhale before speaking. As the intelligibility of their speech decreased, the volume of air in the lungs also decreased. Most of the subjects spoke on low lung volumes. Some of the subjects even initiated reading and conversation below the functional residual air capacity (FRC) without inspiration. When speaking below FRC, the speaker had to use a higher muscular pressure to achieve speech. The impact of orthography The impact of orthography also causes deviate pronunciations of speech sounds. Each language has its own sound system and sound patterns, i.e., specific rules of how to combine phonemes to build up meaningful words and utterances. The knowledge of the phonological system of a specific language includes the knowledge of its pronunciation rules. The new teaching situation in the Swedish Deaf schools implies that speech-training methods nowadays are based on written Swedish and the use of sign language for instruction and explanation. This means that it is extremely important that the children are well familiar with the pronunciation rules (text-tospeech rules). Insufficient knowledge of these rules causes typical deviations. Especially the various spellings of the phonemes /S, C, o˘, E˘/ give rise to some deviations and the fact that two, and sometimes three, letters are pronounced as one sound in Swedish is not obvious to some children. 81 Computer-Based Speech Therapy Using Visual Feedback Teaching methods It is well known that human speakers adjust their speaking habits or speaking styles to the communicative situation. Many references to this phenomenon exist in the literature, like the “hyper vs. hypo speech” of Lindblom and Moon (1991). This "over articulation" is not merely a louder version of normal speech, but might also involve an active reorganisation of phonetic gestures. When teaching profoundly hearing-impaired children articulation skills, the teacher often hyperarticulates to improve lipreading. This might cause various deviations in the speech of the children as they tend to be overly sensitive to irrelevant visible variation. Examples of visible interference were found in a study by Öster (1991, 1992b). It was shown through a detailed phonological analysis that deviant phone types represented different phonemes in a profoundly hearing-impaired child’s speech, despite a phonetic similarity to [b]. The child made contrasts between /p/and /b/in initial position through lip-protrusion instead of voicing. Background variables Levitt (1987) has shown that the effect of background variables on the speech of hearing-impaired children is of particular interest. He emphasised the important role of special education and an early and effective intervention. He divided the background variables into the following three groups: ¾ etiological - age at onset of hearing loss - profound hearing loss in the family - hearing level ¾ educational - age at onset of special education - age when hearing aid was first fitted - use of hearing aid - intelligence quotient - reading score - syntactic comprehension 82 Effects of speech input limitations on speech production ¾ other variables - other handicapping conditions - home language - parental occupation, - socio-economic status - number of siblings 5.3. Deviations in the speech of L2 learners As stated above in section 1.1.2, L2 learners articulate some target phonemes deviantly because the number and/or the quality of the speech sounds of the mother tongue (L1) differ from those of L2 (Eskenazi, 1999) and typical pronunciation difficulties for a given target language will differ for speakers of different native languages (Dalby & Kewley-Port, 1999). For Swedish as the target language some general segmental and prosodic deviations can be summarised from studies made by Bannert (1990), McAllister (1986, 1995), Thorén (1994 ) and Öster (1998, 1999a) as follows: Vowels: The Swedish language has as many as 9 phonologically long and 9 short vowels. The high number of Swedish front vowels and the fact that some of them are rounded makes it often difficult for L2 speakers to have a complete productive and perceptual mastery of all these vowels. Moreover high and mid-high long Swedish vowels are diphthongised in open syllables. However, diphthongisation of long vowels in open syllables is seldom found in L2-speech. Some of the front vowels in Swedish are rounded. It is often difficult for L2 speakers who do not have rounded front vowels in their native language to pronounce these vowels correctly. Therefore they often substitute the Swedish front vowels (ö) [O˘] and (u) [¨˘] with the rounded back vowels (å) [o˘] and (o) [u˘]. In their native language rounding is connected with a back placement of the tongue. Consonants: Some of the Swedish consonants cause production and perception difficulties for L2 speakers, especially the voicing contrast between plosives, aspirated voiceless plosives, the great 83 Computer-Based Speech Therapy Using Visual Feedback number of fricatives, the nasal velar sound and the fact that dentals are retroflexed when they are preceded by /r/. Prosody: Quantity is an important contrast in Swedish, which is realised as a difference in duration and, for some pairs, also with a difference in vowel quality. Furthermore, the following consonant has opposite quantity; i.e. long vowels are followed by short consonants and vice versa. It has been reported that quantity differences between phonologically long and short vowels are difficult for L2 speakers to have a complete mastery of (Bannert, 1990; McAllister, 1986; Thorén, 1994; and Öster 1998, 1999a). Common problems with Swedish prosody are to produce accent II and to produce a long consonant after a short vowel in stressed syllables. 84 Interaction between individual deviations and speech intelligibility 6. Interaction between individual deviations and speech intelligibility 6.1. Introduction The effect of different phonetic deviations on the intelligibility of profoundly hearing-impaired speech has been studied using different techniques: correlational studies and qualitative studies (Hudgins and Numbers, 1942; John and Howarth, 1965; Monsen, 1983), manipulation by means of digital speech processing (Huggins, 1977; Kruger, Stromberg and Levitt, 1972; Maassen and Powel, 1984; 1985; Osberger and Levitt, 1979), or speech synthesis techniques (Bernstein, 1977; Öster 1985, 1990). With digital speech-coding techniques, only suprasegmental deviations are easily manipulated, but in the case of speech synthesis, both segmental and suprasegmental factors can be manipulated. The effect on the intelligibility of different kinds of deviations from normal production has not been decisively established. Most of the previous studies have not studied the interaction between the deviations and the individual effect of a deviation on intelligibility. Hence, the effect on intelligibility has varied depending on the contribution of deviations that has been examined. Gold (1980) claimed that, "Whereas there is much documentation of the kinds of segmental and suprasegmental errors in the speech of the hearing impaired, there is far less evidence of the direct effects of each of these error types on overall speech intelligibility." - "Thus, although we may be able to identify those errors which occur most frequently in the speech of the profoundly hearing-impaired, we need further research to indicate how these error types interact to reduce speech intelligibility and to determine which error types should be the first to be considered when planning a training program for improved speech production in the hearing-impaired child." (p. 415.) 85 Computer-Based Speech Therapy Using Visual Feedback In this chapter a study by Öster (Öster 1985; Granström & Öster 1994a, 1994b) is described where normally hearing subjects estimated the effects of individual segmental and prosodic deviations on intelligibility in the simulated speech of three children from the Manilla School for Deaf Children in Stockholm. The deviations of each child were classified in an order of precedence according to intelligibility, giving an indication of which deviations should be corrected first, as they affected the intelligibility to the highest degree. Based on the results of the listening tests, recommendations could be given regarding an individual order for efficient correction. These findings stressed the importance of an individual assessment and an individual speech-training program for speech improvement. 6.2. The effects of individual deviations on speech intelligibility measured by means of synthetic speech 6.2.1. Introduction The synthesis-by-rule system, developed at the Department of Speech Communication and Music Acoustics, KTH, (Carlson, Granström & Hunnicutt, 1982) was used in this study. By using this technique it was possible to sort out segmental and/or prosodic deviations, leaving other features unchanged to study the effect on intelligibility and the interaction between them through listening tests with normally hearing subjects. In the synthesis-by-rule system an arbitrary text is transformed into synthetic speech with the help of phonetic and phonological rules working on both the segmental and the prosodic level. On the segmental level the system is based on phonemes. The synthesis-by-rule system was judged to be useful when studying the impact of individual deviations on intelligibility, as our hypothesis was that profoundly-hearing impaired speakers have well-established speech habits, see section 2.2.3. 86 Interaction between individual deviations and speech intelligibility 6.2.2. Assessment of the speech of three profoundly hearing-impaired children The children were educated mainly by sign-language. Their age at the time of recording was 11 to 15 years. Their pure tone averages were around 90 dBHL in the better ear (.5, 1, 2 kHz). It was apparent that the children had some residual hearing. Their speech was recorded when they read monosyllabic words, polysyllabic words, and a coherent text. In the words all the Swedish vowels and consonants in all phonotactically possible positions were represented. A phonetically trained listener transcribed the segmental production broadly, and the fundamental frequency and its range, pausing, speech rate, segment duration, and stress-patterns were analysed in the coherent text by means of oscillographic recordings. The assessment of the speech of the three children was made through traditional error analysis that did not pay any attention to contrastive function. The transcription was broad, and the deviations made by the children were translated into simplified phonetic rules. In this study, the intention was not to investigate what the speech of the children did express, but rather to gain knowledge of the relative effect on intelligibility of a specific phonetic deviation. Below, a systematic description of the three children’s segmental and prosodic deviations is presented which formed the basis of the programmed phonetic rules that generated the synthetic speech. Consonant deviations The plosives of child A were all voiced except /g/ which was unvoiced. All the /s/ and /Ó/ sounds coincided with the pronunciation of [C]. There were no nasals and /m/ and /n/ were produced at the right place but in a wrong manner. The retroflexed consonants were not articulated as one sound but as two. Child B had plosives that were voiced in initial position and unvoiced and aspirated in final position, except for /g/, which was nasalized in medial and final position, see Table 6.1. The sibilants were very indistinct and preceded or followed by /k/. 87 Computer-Based Speech Therapy Using Visual Feedback Table 6.1. Description of the realized consonant systems of three profoundly hearing-impaired children in initial, medial and final positions. Shaded areas pertain to phonotactically impossible positions. Child A Target p t k b d g f v s Ó C ∆ k f f C C C d k f f C f f C h m n d h b d C d b d C d b d N l r l {C C l {C C l {C Initial position b d k b d Medial position b d k b Final position b d k b d k k b d g f v s g g f f Sk Sk kS g h b t Ng f f Sk Sk g th Ng fh fh S1k g f v s Ó C ∆ h f f ts C C C h f f t C f f C C Child B Target p t Ó C ∆ h m n N l r g “ m Ng Ng g “ m l´ “ l r l r Initial position b g g b g Medial position p t k p Final position ph th kh ph S1k Ng Ng m n N mb nd C mb nd N l r C mb nd N lE r Child C Target p t k b d g Initial position p t k p t k Medial position p t k p t k Final position p t k p t k The dental sounds /j/ and /l/ were often realized as a voiced velar sound. Many of the consonants were produced far back in the mouth. The retroflexed dental consonants were articulated as a sequence of /r/ and a dental, in accordance with the orthographic representation. The plosives of child C were unvoiced and some 88 Interaction between individual deviations and speech intelligibility fricatives sounded like [C]. Like child B this child produced some consonants differently depending on the position in the word. Most of the nasals were followed by a plosive produced at the same place. Retroflexed consonant clusters were produced as separate sounds. Vowel deviations Tables 6.2 - 6.4 give a detailed description of the children’s vowel systems. Table 6.2-6.4. Description of the realised vowel system of three profoundly hearing-impaired children. Child A Key word Sil (strainer) Sill (herring) Vet (know) Vett (sense) Säl (seal) Säll (blissful) Här (here) Herr (Mr) Syl (pricker) Syll (sill) Föl (foal) Föll (fell) För (prow) Förr (before) Hus (house) Hund (dog) Rot (root) Rott (rowed) Gå (go) Gått (gone) Hat (hate) Hatt (hat) Target Realised as i˘ I e˘ E E˘ E Q˘ Q y˘ Y O˘ O ø˘ ø ¨˘ P u˘ U o˘ ç A˘ a E E E E E E Q Q ø P ø ø ø ø ø P çj çj o˘ çj a: a 89 Computer-Based Speech Therapy Using Visual Feedback Child B Key word Sil (strainer) Sill (herring) Vet (know) Vett (sense) Säl (seal) Säll (blissful) Här (here) Herr (Mr) Syl (pricker) Syll (sill) Föl (foal) Föll (fell) För (prow) Förr (before) Hus (house) Hund (dog) Rot (root) Rott (rowed) Gå (go) Gått (gone) Hat (hate) Hatt (hat) Target Realised as i˘ ´ I ´ e˘ e˘ E E E˘ E˘ E E Q˘ Q˘ Q Q y˘ Y˘ Y Y O˘ ø˘ O ø˘ ø˘ ø˘ ø ø ¨˘ o˘ P P u˘ P˘ U o o˘ P˘ ç P A˘ A˘ a a˘ Child C Key word Target Realised as i˘ i˘ Sil (strainer) I I˘ Sill (herring) e˘ e˘ Vet (know) E E˘ Vett (sense) E˘ Q˘ Säl (seal) E E˘ Säll (blissful) Q˘ Q˘ Här (here) Q Q˘ Herr (Mr) y˘ i˘ Syl (pricker) Y Y˘ Syll (sill) O˘ O˘ Föl (foal) O ø˘ Föll (fell) ø˘ ø˘ För (prow) ø ø˘ Förr (before) ¨˘ ¨˘w Hus (house) P Y˘ Hund (dog) u˘ ´u˘w Rot (root) U ø˘ Rott (rowed) o˘ ´o˘ Gå (go) ç ø˘ Gått (gone) A˘ a˘ Hat (hate) a a Hatt (hat) Child A reduced the Swedish vowels to six short and two long vowels. All vowels were nasalized. All close vowels were pronounced half-opened. Child B had nine long and seven short vowels. Deviations were found in both quantity and quality. All back vowels were realized as central [P]. The child’s lip positions were correct, while the tongue positions sometimes are incorrect. This shows the impact of lip-reading on the speech of severely hearing impaired children. Child C had as many as twelve long vowels and 90 Interaction between individual deviations and speech intelligibility only one short vowel. Unrounded vowels were nasalized and some rounded vowels were diphthongized. Prosodic deviations Child A had a monotonous rhythm due to incorrect pausing between syllables. Her fundamental frequency was extremely high with a lowering at the end of every vowel. The child also had extended segment duration. Child B had a normal speaking rate. However, the speech was not fluent, due to the fact that he breathed after every second word. He emphasised the beginning and the end of every sentence. The most interesting deviation was his vowel-dependent fundamental frequency variation because of an excessive articulatory tension. Child C had a normal fundamental frequency but a remarkably slow tempo. This child made a pause after every word and extended the last syllable in every phrase. Furthermore, he extended the occlusion phase in the production of the plosives /p, t, k/. The prosodic deviations of each child formed the basis for the rules used to generate the prosody of the simulated speech. 6.2.3. Listening test Based on the above assessments, simplified phonetic rules were constructed that generated the synthetic speech for listening tests. The speech of each child was represented in 32 sentences. Every sentence contained four key words. Every deviation or combination of deviations was presented in groups of four sentences (16 keywords). Table 6.5 shows the different combinations of deviations used in the listening test, corresponding to corrections of the different deviations. The listeners were 21 students at the Department of Linguistics, Stockholm University. They listened through earphones to one child at a time with a short interval between each child. Every sentence was presented twice. The listeners wrote down what they understood. The test started with eight synthesized sentences with no deviations to familiarize the listeners with synthetic speech. 91 Computer-Based Speech Therapy Using Visual Feedback Table 6.5. Combination of deviations used to simulate the speech of three profoundly hearing-impaired children and the corresponding corrections of deviations measured through a listening test. Type of deviations No deviations Vowel deviations Consonant deviations Prosodic deviations Vowel & consonant deviations Vowel & prosodic deviations Consonant & prosodic deviations Vowel & consonant & prosodic deviations 6.2.4. Corrections of deviations All deviations corrected Consonants and prosody Vowels and prosody Vowels and consonants Prosody Consonants Vowels No corrections Results Figures 6.1-6.3 show the result of the listening test of the simulated speech of the three children. The deviations are arranged according to increasing speech intelligibility. The results are classified in three groups of intelligibility: • • • Unintelligible speech = 0-25% correctly identified key words Semi-intelligible speech = 25-50% correctly identified key words Intelligible speech = 50-100% correctly identified key words Based on these results some recommendations could be given regarding the speech training of the children. The speech of child A was unintelligible as the intelligibility was 17%. The vowels should be corrected first, increasing the intelligibility to 51%. After that, her consonants should be trained. Without consonantal deviations, the intelligibility would increase further to 66%. Finally, the prosody deviations should be corrected. The simulated speech of child B was also unintelligible with only 7% intelligibility. The consonant deviations had a seriously deteriorating effect on the intelligibility. If these are corrected, the intelligibility will increase to 61%. Thereafter, the vowels should be trained. With the vowels corrected the intelligibility will further increase to 97% and the speech will be completely intelligible. 92 Interaction between individual deviations and speech intelligibility Effects of corrections on intelligibility - Child A 100 90 % Intelligibility 80 70 60 50 40 30 20 10 0 Prosody No Cons Vow+Pros Vowels Vow+Cons Cons+Pros All Effect of corrections on intelligibility - Child B 100 90 % Intelligibility 80 70 60 50 40 30 20 10 0 No Vowels Prosody Vow+Pros Cons Cons+Pros All Vow+Cons 93 Computer-Based Speech Therapy Using Visual Feedback Effects of corrections on intelligibility - Child C 100 90 % Intelligibility 80 70 60 50 40 30 20 10 0 Pros No Cons+Pros Vow+Pros Vowels Cons Vow+Cons All Figure 6.1-6.3. Intelligibility of synthesized versions of the speech of the hearing-impaired children. Corrections of vowel (VOW), consonant (CONS) and prosodic (PROS) deviations have been simulated. Finally, the prosodic deviations should be corrected, but not motivated primarily by an expected increase in intelligibility. However, the simulated speech of child C was intelligible (64%). If the consonantal deviations were corrected first, the intelligibility would be as high as 91%. Then a correction of the vowel deviations would increase the intelligibility to 96%. An interesting result of this study was that the effect of the prosodic deviations on intelligibility was negligible, perhaps due to the short sentences. For child A and child C the intelligibility even decreased when only the prosodic deviations were corrected. This was probably owing to the slow speaking rate and the frequent pausing that gave the listener more time to segment and interpret the meaning of the sentences. Thus, a slow tempo impacts the intelligibility positively on the condition that the speech contains grave segmental deviations. It makes the speech unnatural but still not unintelligible. 94 Interaction between individual deviations and speech intelligibility 6.2.5. Conclusions One conclusion drawn from this study is that a correction of errors in the segmental articulation of speech is probably necessary before a significant improvement in intelligibility can be achieved. An improvement in the prosodic elements alone does not result in an immediate gain in intelligibility - it may instead improve the naturalness and overall quality of the speech. This had already been shown by Osberger and Levitt (1979) and later by Maassen and Powel (1985) in studies where temporal deviations in sentences spoken by profoundly hearing-impaired children were artificially corrected towards normal speech through digital manipulation. The samples were then subjected to intelligibility tests. It was observed that segmental deviations, as opposed to prosodic deviations, were primarily responsible for reducing the intelligibility of the speech of profoundly hearing-impaired children. Only when the segmental articulation was satisfactory, could improvement in intelligibility be obtained from the correction of prosodic deviations. Another important conclusion is that each child’s speech production is unique. Therefore, it is extremely important that an individual diagnosis is made prior to therapy to find those deviations that impact on the intelligibility of the speech. A complete diagnostic assessment includes both an analysis of the existing articulation skills as well as an analysis of how the articulation is realized in linguistic use. Because of the fact that profoundly hearingimpaired children depend on lip-reading when learning speech it is also important to investigate how the interference of visibility affects their possibilities to realize phonological contrasts in the Swedish language. Methods to analyze deviant speech to assess what the speech expresses are discussed in the next chapter. 95 Phonetic realizations of phonological systems 7. Phonetic realizations of phonological systems 7.1. Introduction Several studies of the phonological systems of hard-of-hearing children (West & Weber, 1973; Abberton, Fourcin, Hazan, 1985; Oller & Kelly, 1974; Dodd, 1988; Oller & Eilers, 1981; Öster 1991, 1992a, 1992b, 1995c) have shown that, although profoundly hearingimpaired children make deviations in production, they possess some kind of abstract and stable phonological system. Studies of the phonological systems of children with profound hearing losses have been inspired by Clinical Phonology (Hodson, 1980; Shriberg & Kwiatkowski, 1980; Grunwell, 1987; Ingram, 1989) that was developed to analyse phonological processes in disordered speech through descriptions of contrasts and processes. According to Grunwell (1987), there are five major clinical assessment procedures that are based on a phonological process analysis: • • • • • Phonological Process Analysis (PPA; Weiner, 1979) Natural Process Analysis (NPA; Shriberg and Kwiatkowski, 1980) Assessment of Phonological Processes (APP; Hodson, 1980) Procedures for the Phonological Analysis of Children's Language (PPACL; Ingram, 1981) Phonological Assessment of Child Speech (PACS; Grunwell, 1985) Linguistic theories such as Taxonomic Phonemics and Generative Phonology, Jakobson's Child Phonology theories (1968) and Stampe's Natural Phonology (1979) have contributed to the development of Clinical Phonology. Taxonomic Phonemics, developed in the 1940's and 1950's, classifies contrasting sound units or phonemes which cause a difference in meaning. Generative Phonology, described by Chomsky and Halle in the 1960's uses distinctive features and formal rules to describe the sound patterns of a language. Jakobson (1968) 97 Computer-Based Speech Therapy Using Visual Feedback claimed that children learn contrasts, not individual sounds, in a certain order from maximal to minimal contrast. Stampe (1979) stressed that small children have a tendency to simplify adult speech by innate rules or processes. 7.2. Speech assessment methods All assessment is time-consuming and laborious. However, to prevent speech therapy from becoming a series of meaningless "articulatory gymnastics" sessions, the therapist must be prepared to spend time carrying out an appropriate diagnosis prior to therapy. Therapy and assessment are inseparable, as assessments are required regularly during the training program. A deviation may be looked upon as a phonetic deviation or a phonological deviation. A phonetic deviation affects the naturalness of the speech and could be the result of an incorrect phonation or an articulatory movement that has no effect on the speaker's ability to signal meaning differences. The presence of a phonetic deviation is defined as a phonological deviation if it changes the meaning of the word. Phonological deviations affect the intelligibility of a child’s speech and cause meaning differences in spoken language. In speech therapy, it is most important to first concentrate on a special treatment of phonological deviations in order to improve intelligibility and then focus on phonetic deviations to improve the naturalness of the speech. 7.2.1. Phonetic error analysis As has been stated in the Introduction, deviant speech is traditionally assessed using phonetic error analyses only. A phonetic error analysis is proper to use with less deviant speech, as for instance for assessment of the speech of moderate hearing-impaired children, as the articulation skill is compared with that of normally hearing speakers without paying attention to phonetic contexts or to the production’s contrastive function in a specific language. Such an analysis uses a coarse phonetic transcription, missing important articulatory details, and provides only information on what the client is not capable of articulating. Hence, only speech sounds that a client never articulates correctly will be treated. A phonetic error analysis 98 Phonetic realizations of phonological systems takes for granted that the sounds that the child articulates correctly in isolation, the existing articulation skills, also are used correctly. Obviously, this approach provides insufficient and misleading information on which to base an effective speech therapy program with more deviant speech, as it pays no attention to either the usage of the existing articulation skills or to whether a deviant articulation might signal a "correct" contrast. Profoundly hearing-impaired speakers are dependent on visual cues and often try to realise the visual representation of a phonetic contrast that is signalled auditorily. This means that a deviant production may be an attempt to realise a speech sound contrast (Monsen, 1976) as discussed in sections 2.2.3 and 7.3.3. In a study by Öster (1989a) it was shown that profoundly hearing-impaired children understand the phonological contrasts between visually similar consonants but have difficulties in realising them correctly. Systematic deviations from normal were found in the speech of a 15-year-old, prelingually and profoundly hearingimpaired boy, educated using sign language, when he produced Swedish stops. His pure tone average in the better ear was 108 dBHL. A systematically deviant contrast was found in initial position between voiced and unvoiced bilabial stops and in final position between voiced and unvoiced dental and velar stops. The child contrasted unvoiced stops from their voiced cognates in initial position by lip protrusion and in final position by adding a neutral vowel instead of a voicing contrast. 7.2.2. Phonological analysis Nowadays, the assessment of exceptionally deviant speech, such as the speech of profoundly hearing-impaired children and some types of L2-speech, is more concentrated on deeper aspects of speech production like information about the linguistic use of the existing articulation skills in all possible positions and in what way the child's phonetic realisations differ from the normal model. This is done through phonological analyses. According to Grunwell (1987) children’s speech problems are largely confined in effect to the patterns of consonantal usage. This is very obvious for profoundly hearing-impaired children who develop systematic contrasts for consonants, which however differ from those 99 Computer-Based Speech Therapy Using Visual Feedback of normally hearing speakers, principally due to an increased reliance on visual cues. To identify the phonetic deviations that affect a child’s ability to signal meaning differences in spoken language, a very detailed transcription must be used and an expansion of IPA symbols and diacritics is necessary (see Appendix I). As concerns vowels, deviant speech most often involves systematical substitutions, which is why a coarse transcription provides sufficient information. If speech therapy is made without awareness of the child's phonological system, it may destroy already built-up couplings between abstract entities and articulation. This might even result in a decreased intelligibility of the speech after training owing to a reduced number of contrasts. A speech therapy program based on a phonological assessment will be more directed towards training of distinctive features and of the correct production of contrasts between often visually similar consonants, rather than towards a correct pronunciation of a specific sound. When a deviantly produced distinctive feature is corrected, the articulation of all speech sounds that contain this feature will improve. In the following sections descriptions of phonological analyses are shown of a profoundly hearing-impaired child’s consonant production and of an L2-speaker’s (Bosnian speaker) vowel production. 7.3. Description of a phonological analysis of a profoundly hearing-impaired child’s consonant production Öster (1995a) describes a phonologically based assessment method on which to base speech therapy. The analysis outlines a child's unique phonological system (phonetic and phonological inventories) and expresses what the child’s speech expresses. It is done in three steps, which are described below. First the individual child's existing articulation skills are assessed. After that the child’s usage is analysed. Finally the idiosyncratic realisations of phonological contrasts are analysed through the detailed phonetic transcription. The outcome of such a complete assessment provides significant and valuable information to base the therapy on. 100 Phonetic realizations of phonological systems As an example of this method, the video-recorded speech of a fifteenyear-old child, educated using sign language, will be phonologically analysed below (Öster 1995a). His pure-tone-average (at .5, 1 and 2 kHz) in the better ear was 108 dBHL. Obviously, speech information was not perceived at all without a hearing aid. The child was videorecorded saying 58 polysyllabic Swedish words. Within the constraints of the Swedish phonotactic system, these words sampled each Swedish consonant at least twice in the initial, medial, and final positions. Consequently, each phoneme was sampled at least 6 times. The video-recorded speech was transcribed in detail using IPA symbols and some of the specific diacritics used by Bush et al. (1973); Grunwell (1987) and Roug, Landberg, and Lundberg (1989) developed to transcribe babbling and phonetic development in early infancy, see Appendix 1. 7.3.1. Step 1: Analysis of the existing articulation skills Many children articulate several speech sounds correctly in isolation but have problems in producing them contrastively in various contexts. Therefore it is important to concentrate upon those speech sounds which the child can produce in isolation or in syllables, but which are not correctly realised in a linguistic context. These speech sounds should be established in all contexts before attempts are made to introduce the speech sounds, which are not yet within the child's productive inventory. By assessing the existing articulation skills, it is also possible to exclude a motor disorder as a cause of a phonological deviation. Table 7.1 shows the consonants that are present (the existing articulation skills) and the consonants that are absent in the child's inventory (the blank cells). This is done without regard for the accuracy of the child's productions. Although the child has an articulatory knowledge of eight of the eighteen Swedish consonants, some deviations occurred in different positions, due to limited information of phonetic features, limited knowledge of the rules of pronunciation, etc. The table shows that he produced only five of his eight consonants correctly in the initial position, six of them in the medial position and five of them in the final position. The consonants which caused deviations were /g, ç, m/. 101 Computer-Based Speech Therapy Using Visual Feedback Table 7.1. The table shows the existing articulation skills of a profoundly hearing-impaired child (in bold), the word position where the consonant was correctly articulated and the word position where a deviation occurred, (Öster 1995a). IPA Existing articulation skills p t k k yes yes yes b d g b d g yes yes no yes yes no yes yes no f v s f yes yes yes C4 no m no yes no r yes yes yes S C4 ∆ h m n N l r 7.3.2. Used correctly Initial Medial Final Step 2: Assessment of the usage of the existing articulation skills through a detailed phonetic analysis After this, the usage of the consonants that the child knows how to articulate is assessed, because the consonants that the child can articulate in isolation are not always correctly realized in different linguistic contexts. The child needs to learn to use his existing articulation skills appropriately in all word positions before new speech sounds can be taught. The assessment is done by analysing 102 Phonetic realizations of phonological systems the detailed phonetic transcription made. The transcriber must try to capture all the relevant visual details, either at the time of recording or through the use of video, in order to complete the picture of the child’s contrastive system, according to Summerfield (1979). In Table 7.2 the existing articulation skills are shown in bold style /k, b, d, g, f, ç, m, r/ and the deviations made for those consonants in initial, medial and final word-position. Table 7.2. Phonetic consonant deviations made in various word positions in a profoundly hearing-impaired child's speech. The existing articulation skills are shown with arrows (Öster 1995a). Shaded cells = phonotactically not applicable. Blank cells indicate that the consonant was pronounced correctly. Hence only the consonants /k, d, f, r/ were correctly produced in all word positions. It is also interesting to study how the child used his 103 Computer-Based Speech Therapy Using Visual Feedback existing articulation skills for those consonants that he had not yet learned. Special attention should be paid to the consonants /g, ç, m/. The child has an articulatory knowledge of these sounds, but his productions always (except /m/ in medial position) involve substitutions of other phonemes when he used them in a linguistic surrounding. It can be seen that [g] is used for /N/ but when asked to produce /g/ it is realized as [k]. Similarly [ç] is substituted for /s/ and /S/ but when the child is asked to produce /ç/ he substitutes it with a [k]. The nasal /m/ is produced almost correctly in the medial position but in the initial and final position it is phonetically similar to /b/. 7.3.3. Step 3: Assessment of idiosyncratic realisations of phonological contrasts and regular error patterns Finally by studying Table 7.2 closely, important information is offered about substitutions and articulatory details used to signal differences between consonants. Through this analysis it is possible to determine whether a deviant pronunciation is, in fact, an attempt by the child to realize a phonological contrast and to find the child’s regular error patterns (phonological rules). Some of the deviant phone types represent different phonemes despite the phonetic similarity. For example, many of the deviant phone types are similar to /b/ and /d/. It can be assumed that the child, despite the phonetic similarity to [b], makes contrasts between /p/, /b/ and /m/ in the initial position through a voiced non-aspirated /p/, lip protrusion for /b/ and nasal air emission for /m/. It is also very likely that the child, despite the phonetic similarity to [d], contrasts /t, d, j, n, l/ in the final position, through a non-audible release for /t/, retroflexion and frication for /j/, nasal air emission for /n/ and retroflexion for /l/. Obviously, the child understands the phonological contrasts between these visually similar consonants, but has difficulties in realising them correctly. Many of the deviant phonological processes found in the speech of normally hearing children (Nettelbladt, 1983), as for instance simplifications like “stopping” of fricatives, can also be 104 Phonetic realizations of phonological systems found in the speech of prelingually and profoundly hearingimpaired children. Normally hearing children produce these simplifications during a short period of their development. However, profoundly hearing-impaired children often preserve these processes together with other deviant processes that are due to the special condition under which they learn to speak, (Oller and Kelly, 1974; Dodd, 1974). 7.4. The importance of a detailed phonetic transcription The outcome of this phonological analysis through a detailed transcription has provided significant and valuable information that could not have been obtained if only a traditional phonetic error analysis and a coarse transcription had been used. By studying the detailed transcription, revealing important articulatory details, it is possible to determine if the child’s deviant pronunciations are, in fact, attempts to realize phonological contrasts due to visible interference. Figure 7.1 shows the different outcomes from a coarse and a detailed phonetic transcription of the child's consonant production. A coarse phonetic transcription would have missed the important articulatory details that the child used, trying to realise the contrasts discussed above. The result of the coarse phonetic transcription would have been that /p, m/ were substituted by /b/ and /t, j, n, l/ by /d/, i.e., that the child was missing /p, m, t, j, n and l/ in his phonetic inventory. Articulatory training of each of these consonants in isolation might dissolve the unique phonological system of the child. Instead, the speech therapy must be directed to deal with all these contrasting consonants simultaneously. Visual feedback has shown to be excellent for developing contrasts between visually similar consonants in the speech of profoundly hearing-impaired children. In a study by Öster (1989b, 1989c) improvements in producing durational contrasts between long and short vowels and voicing contrasts between velar stops in medial and final word positions were found after some training with a computer-based speech therapy program with visual feedback. 105 Computer-Based Speech Therapy Using Visual Feedback This type of training made the children aware of their deviant way of expressing contrasts and helped them train correct contrasts through the immediate and meaningful visual feedback. Figure 7.1. The outcome of a coarse and a detailed phonetic transcription of a profoundly hearing-impaired child's phonetic deviations, (Öster 1995a). 7.5. Therapy based on existing skills As has been shown above, a phonological assessment provides information on existing skills, i.e. the amount and type of correctly articulated consonants and vowels regardless of position, how these are used and if a deviant production in fact may be an attempt to realize a phonological contrast. The fact that this kind of assessment shows what the child’s speech expresses means that it provides suitable information to base the speech therapy on. To expand the usage of existing articulation skills of a child before new speech sounds are taught has shown to be an effective and fruitful therapy method in order to extend the child’s inventory and improve the intelligibility of his/her speech. To get a general view of the average phonetic and phonological competence of a group of 11 profoundly hearing106 Phonetic realizations of phonological systems impaired children a study was made by Öster (1991). The phonetic analysis also showed how visibility affects the acquisition of profoundly hearing-impaired children’s segmental production and which vowels and consonants they learn first. Eleven prelingually profoundly hearing-impaired children, educated by sign language, participated in this study. One child was eleven years of age, while the others ranged from fifteen to seventeen years. Their pure tone averages (0.5, 1 and 2 kHz) were between 90-108 dBHL in the better ear. The children were video-recorded when they read a list of familiar words also illustrated by pictures. The word list contained at least two presentations of all Swedish consonants and vowels in initial, medial and final position, if phonotactically possible. A narrow phonetic transcription was made from the video-recorded speech by the author using IPA symbols and additional diacritics, see Appendix 1. Figure 7.2 shows the number of children who, at least once in the material, articulated each Swedish consonant correctly regardless of position and without any reference to contrastive function. The children, as a group, articulated 70% of the Swedish consonants correctly. The consonants which most of the children articulated correctly were those that are visually contrastive and easy to lip-read, such as bilabial and dental stops, labiodental fricatives and the lateral consonant. The children also produced the unvoiced stops, /p, t, k/, much better than the voiced stops, /b, d, g/. This result shows that profoundly hearing-impaired children first learn to pronounce those consonants with visible speech organ gestures through imitation. However, those distinctive features like, for example, nasality/non nasality and voiced/unvoiced that are invisible through lip-reading can very well be taught to the children by a meaningful graphical visual feedback, for instance through different colours; green for unvoiced, red for voiced and blue for nasals. 107 Computer-Based Speech Therapy Using Visual Feedback 11 hearing-impaired children Correctly articulated Swedish consonants 11 10 9 8 7 6 5 4 3 2 1 0 p l t f m n k d r tj j s ng b sj h v g Swedish consonants Figure 7.2. Existing articulation skills of the Swedish consonants by eleven profoundly hearing-impaired children of grade 10, (Öster 1991). Figures 7.3-7.5 show the correct usage (black bars) in different positions of those consonants which the children knew the articulation of (grey bars). The children, as a group, correctly articulated 70% of the Swedish consonants, but could only make correct use of 43% of them in initial position, 50% of them in medial position and 50% of them in final position. This indicates a discrepancy between correct articulation regardless of position and correct articulation with reference to contrastive function and phonetic contexts. A large difference in heights between the two bars, representing correct articulation (phonetically) and correct use (phonologically) indicates that this particular consonant was difficult for profoundly hearing-impaired children to realize correctly in this position. By the means of a phonological assessment of a detailed phonetic transcription, as in Table 7.2, it was possible to examine in what way a consonant was realized in various word positions and what constitutes the discrepancy between correct articulation and the effect of phonetic context. 108 Phonetic realizations of phonological systems Only a few consonants were controlled as well phonologically as phonetically like /t/ in initial position, /p/ and /m/ in medial position and /N/ in final position. Others varied in correct use according to word position. For example, five children could articulate /Ó/ in isolation, but none could articulate it correctly in the initial position, only three in medial position and four in final position. Five of seven children were able to use /s/ correctly in the initial position, five of seven in medial position and four of seven in final position. None of the four children who articulated /v/ correctly in isolation pronounced it correctly in medial and final position. However, three of the children pronounced /v/ correctly in initial position. Some of the consonants, /j, d, k, b/, were equally difficult to use and pronounce correctly in all word positions. The same discrepancy between correct articulation of vowel quality and their representation in spoken language can be seen in Figure 7.6. Profoundly hearing-impaired speakers have a preference for long vowel duration, which according to Oller (1981) is explained by the fact that they depend heavily upon vision and that vision simply does not operate in as rapid a time frame as audition. In this study, the main interest was to find out which vowels are generally acquired first of all through their visual accessibility; the quantity was of less importance than the quality. 11 hearing-impaired children Initial position 11 10 9 8 7 6 5 4 3 2 1 0 articulation usage p l t f m n k d r tj j s b sj h v g Swedish consonants 109 Computer-Based Speech Therapy Using Visual Feedback 11 hearing-impaired children Medial position 11 10 9 8 7 6 5 4 3 2 1 0 articulation usage p l t f m n k d r j s ng b sj v g Swedish consonants 11 hearing-impaired children Final position 11 10 9 8 7 6 5 4 3 2 1 0 articulation usage p l t f m n k d r j s ng b sj v g Swedish consonants Figures 7.3-7.5. Grey bars show the number of children who articulated each consonant correctly. Black bars show the number of children who made correct use of their articulation in initial, medial and final word-positions. 110 Phonetic realizations of phonological systems The result shows that all children controlled the quality of the short front and open vowel /a/, which means that this vowel could be seen as the first acquired vowel by prelingually and profoundly hearing-impaired children. Almost all children knew the production and the rounding contrast between /O˘/(ö:) and /E˘/(ä:) thanks to visibility. hearing-impaired children Vowel quality 11 10 9 8 7 6 5 4 3 2 1 0 articulation usage a ö: å: ä: e: u o: å u: y: a: i: Swedish vowels Figure 7.6. Grey bars show the number of children who articulated each vowel correctly regardless of phonetic context. Light-colored bars show the number of children who made correct use of their articulation in spoken language. The vowels are represented by orthographic symbols. Also the quality of the back close-mid long vowel /o:/(å:) that is pronounced with a relaxed and flat tongue and a rather visible open jaw position was well articulated by ten of eleven children. However, the peripheral vowels /A˘/(a:), /i:/(i:), and /u:/(o:) with the most distinct articulatory positions were those vowels that the children had less articulatory control of. Many children made several confusions and substitutions when they tried to use the vowel quality correctly in spoken language. Especially the quality of /e˘/ 111 Computer-Based Speech Therapy Using Visual Feedback (e:) and /¨˘/ (u:) and to some extent the quality of /E˘ / (ä:), /u˘/ (o:) and /ç/ (å) seems to be difficult to use for many children. These vowels were substituted by other vowels. The first step of a therapy based on existing articulation skills (those consonants and vowels that the child knows the articulation of regardless of position) should be to train correct pronunciation of these, in all possible contexts in spoken language, in order to level out the discrepancy between correct articulation and correct usage. Individual differences will determine the particular training material and order selected for training, since the training is based on existing skills. The treatment is influenced by the Sensory-Motoric Approach devised by McDonald (1964). Bisyllabic drills are used, beginning with small changes of tongue movement between the production of the vowel and the consonant followed by progressively larger changes of movement. Game-like strategies with visual feedback have shown to be helpful and motivating in increasing the children’s accuracy of articulation, repetition of syllables with different articulatory positions, coordination of articulators, breath control, and rhythm and stress placement. 7.6. Description of a phonological analysis of a Bosnian speaker’s production of Swedish vowels As has been said before in section 1.1.2, L2 speakers articulate some target phonemes deviantly because they differ in number and quality from those of their mother tongue (L1). Many vowels are similar by sight and cause systematic vowel substitutions in their speech. In this case a coarse transcription of the pronunciation of vowels provides sufficient information to base the therapy on. Table 7.3 shows a phonological analysis based on a coarse transcription of a Bosnian speaker’s production of Swedish vowels. The Bosnian speaker trained together with twelve adult immigrants with a computer-based speech training system with audio-visual feedback of both perception and production of spoken Swedish. Six of them originated from Bosnia, the others from Cuba, Peru, Saudi Arabia and Russia. On an average they participated in a 112 Phonetic realizations of phonological systems total of six half-hour training sessions. The training was carried out twice a week. Their speech was analysed phonetically and phonologically to get an individual diagnosis of their deviations, their existing articulation skills as well as the linguistic representation. In the first column “Articulates”, all the client’s different productions made of each of the Swedish vowels are shown. The more alternatives there are for each vowel the more uncertain the client was about how to articulate the vowel. The result is similar to a phonetic error-analysis, as it shows what the client is not capable of producing correctly. The symbol (˘) after a vowel means that the client sometimes produced the vowel as a short and sometimes as a long variant. The next step is to find those vowels that the client knows how to produce correctly, i.e. the existing articulation skills. This is compiled in the next column “Existing articulation skills”. It can be seen that the Bosnian speaker knew the quality of 9 of the 22 Swedish vowels. The last column reveals how the client substituted the vowels that he could articulate for other vowels that he could not articulate. In other words, the last column “Represents” shows what his speech expressed and the representation of his existing articulation skills in spoken language. For instance, it can be seen that he used the rounded back vowel /ç(˘)/ for all rounded back vowels and also for the rounded front vowel /O(˘)/. A close analysis of his articulation and usage of Swedish vowels gives important information about the client’s difficulties and what causes them. This can be of great use in the therapy. It can be seen that he has difficulties in producing the difference in rounding between /I, i: and Y, y:/. Actually, he is not able to produce any of the rounded front vowels /y˘, Y, ¨˘, P, O˘, O/ but substitutes these with the closest back vowel. This is due to the fact that front rounded vowels are missing in his native language. He has learnt to withdraw his tongue together with lip rounding, as only back vowels are present in his native language. Another explanation for some of his deviations is the fact that he has problems with the orthography. The vowels /E:-o˘/ (ä-å) are confused as well as the vowels /u:-O˘/ (o-ö) and /A:-ç/ (a-å). This is 113 Computer-Based Speech Therapy Using Visual Feedback most likely due to a lack of knowledge of the Swedish pronunciation rules. Table 7.3. A phonological analysis of a Bosnian speaker’s production of Swedish vowels. IPA Articulates Existing Represents articulatory skill i˘ i˘ I i˘ i˘ y˘ Y I I I I Y i˘ e˘ e˘ e˘ e˘ ´ E˘ E E˘ E o˘ E E˘ E E˘ Q˘ Q a a A˘ a(˘) a(˘) A˘ ç a(˘) ___ y˘ i˘ ___ Y i˘ I ___ ¨˘ u˘ ___ P U ___ O˘ ç(˘) u˘ ___ O ç U ___ ø˘ ø u˘ u˘ ¨˘ O˘ u˘ U ç(˘) u˘ U ç U P u˘ O ç o˘ ç(˘) E˘ ___ ç U ç A˘ u˘ O O˘ U o˘ ç ç(˘) Such an assessment should be the basis for speech therapy of L2 speakers. It is important to make the client aware of his difficulties 114 Phonetic realizations of phonological systems and the cause of them. His usage of his existing articulation skills should be trained before new vowels are taught. 7.7. Conclusions The outcome of the phonological analyses described in this chapter reveals that there are both similarities and differences between hearing-impaired children and second language learners when learning Swedish and that a phonological analysis provides important information which is of great value both in CBST and CAPT. In both cases the phonological analyses showed what the profoundly hearing-impaired child and the Bosnian speaker were not capable of producing correctly and how they substituted the phonemes that they could articulate for other phonemes that they could not articulate in a linguistic context. However, for the profoundly hearing-impaired child, due to interference from the visual modality, the use of a fine phonetic transcription is necessary to reveal details of the idiosyncratic realizations of normal phonological contrasts. 115 Design of visual feedback in Swedish computer-based therapy systems 8. Design of visual feedback in Swedish computer-based therapy systems 8.1. Introduction Today, computer-based speech therapy systems with visual feedback have been given a positive response, and they are commonly used by speech therapists and teachers in Sweden (Öster, 1995b, 1996). Computer-based speech therapy with visual feedback has turned out to be a valuable and efficient complement to traditional speech therapy for all types of hearing-impaired children and is meant to be supervised by a teacher. In this chapter my research dealing with design, functionality and type of visual feedback of three different Swedish systems will be reviewed. The systems will be related to the different needs, aims and possibilities of computer-based speech training with visual feedback of moderate, severely and profoundly hearing-impaired children and L2 learners. This is of importance for how the type and pattern of the external visual feedback ought to be shaped and what type of pedagogical applications should be used to fit each group. Two of the systems are commercial products and one is a research prototype in progress. All of them are product-oriented systems. 8.2. Visual feedback in speech therapy Visual information as an additional source of feedback can be provided by traditional training (by the teacher alone without the assistance of computer-based systems) or by a computer-based system. The visual feedback provided by traditional therapy is given by the teacher and can be summarized as follows: • • • • external delayed verbal subjective 117 Computer-Based Speech Therapy Using Visual Feedback The visual feedback provided by a computer-based speech therapy aid alone is quite opposite. It is: • • • • internal in real time non verbal objective A combination of these two possibilities of feedback, meaning that a teacher uses a computer-based system with visual feedback as a complement to the traditional training, ought to give the best result during treatment. In this way the teacher has all possibilities to assist the child in developing awareness of his own production, comparing his production to a target and following his own progress. 8.2.1. Nature of feedback The nature of feedback of the product-oriented computer-based speech therapy systems is parametrical as it gives acoustic information (spectral or temporal analysis) about the speech product in real time. Within motor learning theory, this feedback is called Knowledge of Result (KR) and is an essential component of learning a motor behaviour (Mahshie, 1995, 1996). The feedback of processoriented systems, on the other hand, gives instruction on how to move the articulators to reach the target production. Within motor learning theory this feedback is called Knowledge of Performance (KP) and according to Mahshie (1995) this feedback is most useful in the instruction phase to get the child aware of his deviant production and understand how a correct realisation should be produced. 8.2.2. Type of feedback The following types of visual feedback will be exemplified in detail in the following sections where each therapy system is described. They will also be discussed according to clinical usability. ¾ Animated graphics • for basic awareness, which illustrates selected dimensions of speech such as pitch, loudness, timing, and the presence or absence of voicing, to establish a relationship between the 118 Design of visual feedback in Swedish computer-based therapy systems graphics and vocal aspects. This can be a balloon, for instance, that gets larger in relation to the loudness of the child’s vocalization. • for correct control of the voice, pitch, or intonation ¾ Evaluative feedback provides an indication of success or failure or provides a measure of “goodness”. It can be either: • exact and tells whether the production was correct or incorrect • acceptable and shows that the child is “on the way” and has made some improvements. This can be shown through a digit from 1 to 5 according to how correct the production was. ¾ Navigational feedback informs the child how to reach the production and is instructional and informative. This can be shown by • comparable, often through a comparison of the child’s production with a correct production of a reference model, see Figure 8.5, or • criterion-referenced based on visual targets of the child’s production that are compared to the norm and to their own improving productions, as in the visual maps of OPTACIA, see Figure 8.18, or • articulation pictures showing correct position of the vocal organs during articulation of individual speech sounds. ¾ Rewarding feedback should stimulate and motivate the child to go on with the training. The training must be pleasant and amusing. The rewarding feedback can reward a production that is • hitting the mark and consists of both visual and auditory effects such as a duck that moves to the right and opens its mouth to show that it is happy with the articulation. If not, the duck will remain standing with its mouth closed. It can also be • encouraging as it reinforces any improvement through for instance a flower in a pot that develops from nothing to a 119 Computer-Based Speech Therapy Using Visual Feedback sprout, to a few leaves, to a blooming flower according to the correctness of the pronunciation. ¾ Recognition used together with different graphics helps the child with the repetitive and additional training that is needed to stabilise a recently learned pronunciation. • phoneme-based comparison by comparing spectra with models of the child’s best productions • ASR (automatic speech recognition) of words with targets of the child’s best productions. ¾ Finally, optional auditory feedback can be used with hearingimpaired children with moderate hearing losses and with L2 learners through the playback of stored recordings and through rewarding sound effects. 8.3. The IBM SpeechViewer 8.3.1. Description of the system The original SpeechViewer program was developed 15 years ago (Crepy et al. 1983) and is now available in twelve different languages. It has been translated and adapted to Swedish in three different versions by Öster (1988a, 1989b, 1996; Lotsson, 2001). The system consists of a PC with a colour display. A microphone, an amplifier, and a speaker connected to a M-ACPA card allow the user to input, store and analyse speech and then display it and play it back. The software contains 13 interactive programs, shown in Figure 8.1, aimed at assisting a child in achieving awareness and control over various speech attributes such as voicing, timing, pitch and loudness as well as refined articulation and prosody. Feedback is given immediately through a variety of graphical designs and gamelike strategies synchronised with optional auditory playback. The programs are grouped into three sections for different areas of use: basic awareness, skill building and phonology exercises, and speech patterning modules. 120 Design of visual feedback in Swedish computer-based therapy systems Figure 8.1. The thirteen different programs included in SpeechViewer III. 8.3.2. Type of visual feedback in different exercises The displays for basic awareness illustrate selected dimensions of speech such as pitch, loudness, duration, voicing and breath control for phonation by playful and easy exercises. These exercises are meant to be used by young profoundly hearing-impaired children to develop fine control of their voices. All the exercises are easy to understand and provide visual feedback in the form of animated graphics for basic awareness according to the child’s production. In the “Sound” exercise, something moves on the screen showing that a sound is produced. A helicopter flies when the child changes the pitch up and down in the exercise “Pitch range”. In the exercise “Voicing,” voiced sounds turn a figure red and unvoiced sounds turn it green. As an example, Figure 8.2 illustrates how to train correct loudness in the exercise “Loudness range”. 121 Computer-Based Speech Therapy Using Visual Feedback Figure 8.2. A screen of the exercise “Loudness range”. The child can see the balloon grow as he makes the sound louder. It is also possible to set two different targets for matching low and high volume. The skill building modules provide a game-like strategy to strengthen ability in pitch, timing, voicing, and speech sound production. In the “Pitch control” module for example, the teacher is able to arrange a “steeplechase course” consisting of a number of targets and obstacles placed over the screen. The child has to move a figure through the course, hitting targets and avoiding obstacles by controlling his pitch, see Figure 8.3. Animated graphics are used for control of the voice and pitch. An exact, evaluative feedback is given after complete success. A rewarding visual and optional auditory feedback, given when the child hits the mark, should stimulate the child to go on with the training. 122 Design of visual feedback in Swedish computer-based therapy systems Figure 8.3. Example of an exercise for pitch control. The goal is to hit the targets (petrol can and petrol pumps) and avoid the obstacles (traffic signs) by controlling the car by raising and lowering pitch. The phonology exercises aim at establishing a consistent and intelligible pronunciation of the Swedish phonemes. Through a phoneme-based comparison, that compares spectra with models of the child’s best productions, the computer evaluates the accuracy of the child’s production. These exercises are to be used in the phase of speech therapy where the child is “on the way” to acquiring a correct production of a particular speech sound. The child’s own “best productions” should be recorded and stored as a target and used as models in exercises for accuracy, matching words containing the target phoneme and contrasting. The goal is accomplished when the child’s best production becomes his or her most common production. With regard to inter-speaker variability problems, dialects, and deviant speech due to hearing disorders this strategy, to use the child’s own best productions as models, is also used with the CISTAAid training (Youdelman, 1994) as well as with the ISTRA-Aid (Kewley-Port & Watson, 1995). In previous versions of SpeechViewer, standard models, consisting of normally hearing children and adults, were constructed as models for training in the instruction phase. This approach was of course not satisfactory and was a common concern of many users of the earlier versions 123 Computer-Based Speech Therapy Using Visual Feedback (Mahshie, 1998). In the exercises for “Accuracy”, “Chaining” and “Contrasting” the sustained phonation of one to four phonemes can be trained or matched in words. Figure 8.4 shows an example of the Two-Phoneme Contrast exercise, where the task is to maneuver the jeep through an obstacle track by pronouncing /s/ to turn to the left and by pronouncing (tj) [C] to turn the jeep to the right. Figure 8.4. Display of the phonology exercise “Contrasting”. The task is to manoeuvre the jeep along the track avoiding the animals. When pronouncing /s/ the jeep goes to the left and by pronouncing /tj/ [C] the jeep goes to the right. All these exercises use animated graphics for control and give an evaluative exact feedback as well as a rewarding visual and optional auditory feedback when the child hits the mark to stimulate the client to go on with further repetitive and additional training. The speech patterning programs display the speech signal as oscillograms, spectra or spectrograms for analysis and training of refined articulation and prosody. The Speech Patterning Module “Pitch and Loudness” displays F0 and/or intensity in real time. A split screen provides a comparison of a learner’s utterance with a model of the teacher’s. The comparison of the two visual patterns gives the learner possibilities to discriminate between any important differences of the distinctive features in the Swedish language. This 124 Design of visual feedback in Swedish computer-based therapy systems possibility of comparison of both the visual pattern and the auditory feedback has been shown to be very effective in the instruction phase for both L2 learners and hearing-impaired children (Öster 1995b, 1995c, 1998, 1999a). This comparable navigational feedback is logical and shown in real time and makes the clients understand in what way their speech differs from the teacher’s. Figure 8.5 shows a training session of repeated syllable with this module. In the upper portion, the therapist’s pattern of repeated production of the syllable /ma/ is shown. In the lower portion, a profoundly hearingimpaired child’s deviantly produced nasal in the repeated production of the syllable is shown. Figure 8.5. Display of loudness over time in “Pitch and Loudness”. In the upper portion, the therapist’s pattern of repeated production of the syllable /ma/ is shown. In the lower portion, a profoundly hearing-impaired child’s deviantly produced nasal in the repeated production of the syllable is shown. Throughout the system the clinical management allows the creation of sustained phoneme models, speech models, and client profiles as well as the reporting and management of client data. Statistics of each activity are saved to follow the improvements of the child. 125 Computer-Based Speech Therapy Using Visual Feedback 8.4. Box of Tricks 8.4.1. Description of the system This teaching and training system for speech- and hearing-impaired children aged 4-10 was developed in the EU project “SPECO” within the INCO-Copernicus program between 1998 and 2001 (Vicsi et al. 2000). Five partners were involved in the project: the Technical University of Budapest, Hungary, the University of Reading, United Kingdom, the University of Maribor, Slovenia, the Robot Control Software, Hungary and KTH, Sweden. The system was developed for four languages: Hungarian (Varázsdoboz), English (Box of Tricks), Swedish (Trollerilådan), and Slovenian (ARTI). The system is product-oriented and offers parametrical acoustic feedback of the speech in real time. The system displays important articulation features graphically. These are called speech pictures and are presented in a clear and entertaining way. The speech pictures make it easy for a child to distinguish between an incorrect articulation and the correct articulation of a reference model. The children are able to learn a correct pronunciation by looking at their own speech pictures and comparing them with the speech picture of the correct pronunciation. The structure of the system is presented in Figure 8.6. The system comprises a general language-independent measuring tool and a database editor. The separation of the complex sounds into their component frequencies is done in critical filter bands, from 80 Hz to 8 kHz. 20 critical bands are used. The database editor makes it possible to construct different modules and vocabularies for other languages. Two language-dependent speech and picture databases, one for teaching and training vowels (the “Vowel Support”) and the other for fricatives and affricates (the “Fricative Support”), are constructed for each of the four languages. All training words are illustrated for small children who have not yet started reading. 126 Design of visual feedback in Swedish computer-based therapy systems Statistical data and dictionary of correctly pronounced data Visualization Distance computations Hearing model Time warping and normalization Figure 8.6. The structure of Box of Tricks Figure 8.7 shows the main menu, consisting of the microphone adjustment possibility, the sound preparation exercises, the vowel training vocabulary, the fricative training vocabulary and the intonation exercises. Figure 8.8 shows an example of illustrated words for training, containing the vowel /ä/ [Q, E] in the middle position. Fig.8.7. Main menu of the Swedish version of Box of Tricks. 127 Computer-Based Speech Therapy Using Visual Feedback Fig.8.8. Choice of illustrated words for training of the Swedish vowel (ä) 8.4.2. Databases for reference speech and normal-hearing children for similarity comparisons Recording and storing a child’s best pronunciation, as the therapist has to do when working with SpeechViewer, has been shown to place too great a demand on some therapists’ experience of basic computer knowledge as well as articulatory and acoustic phonetics (Öster 1996). Therefore the basic idea when developing Box of Tricks was to incorporate a reference speaker as a model for visual and auditory training to make this system more user-friendly. The reference speaker was a ten-year-old Swedish girl with clear and good pronunciation of standard Swedish. The system consists of two types of databases, one for the reference speech that makes comparisons with a correct model possible, and the other called “the child database” that was used to determine models and limit values for the phoneme-based recognition of vowels and fricatives and for background pictures that represent the lower and the upper limit of the spectral envelope for an acceptable pronunciation, see section 8.4.4. 128 Design of visual feedback in Swedish computer-based therapy systems Database for reference speech The Swedish system contains stored reference speech examples recorded as the girl read 17 isolated phonemes, 174 speech units, 350 words, 43 minimal word pairs and 127 sentences. Each utterance was recorded three times using a special editor incorporated in the system. The best utterance was saved and chosen to appear as the reference model in the exercises. The system presents acoustic parameters of the reference speech in a way that is understandable and interesting for young children, also correct from the acoustic-phonetic point of view. Amusing illustrations, called “Speech Pictures”, emphasize either important speech parameters to make them understandable or show the reference speaker’s pronunciation of phonemes, words and sentences on the upper half on the screen for comparison with the child’s production on the lower part of the screen. Articulation pictures, spectrograms and target spectra are also used to make the visual feedback intelligible. All training phonemes and words are illustrated for small children who have not yet learned to read. All reference speech training material is segmented, and time-warping algorithms are used to present the reference speech and the child’s speech immediately below each other, even if the speaking rates are different. The child’s pronunciation of a vowel and a fricative consonant in isolation, in a syllable, in a word and in a sentence is compared according to similarity with the reference speaker’s pronunciation. Different types of feedback are used to motivate and stimulate the child to go on with the training. Child database A Swedish database was collected to measure the acceptable range of speech parameter variation of Swedish children around six years of age with intelligible speech. The text material contained all Swedish sibilants and /j/, all long vowels, and the three short vowels (u, a, å) (differing in quality between long and short variants) in isolation, syllables and short series of syllables, words and sentences. In words, all examined speech sounds occurred in all possible positions. One, two and three syllabic words were included. 31 children between 6 and 9 years of age were recorded as they read the text material. Table 8.1 shows age and gender 129 Computer-Based Speech Therapy Using Visual Feedback distribution of the children. All children belonged to a local primary school in the south of Stockholm. All children had normal hearing and a distinct and good pronunciation. The group was assessed by the author to be a good sample of Swedish children who talk properly for their age. Table 8.1. Age and gender distribution of speakers. AGE BOYS GIRLS 9 4 4 8 4 3 7 7 5 6 2 2 The recordings took place in a quiet room in the school building. The recordings were made on a Casio digital tape recorder DA-7 and a Monacor ECM-100 electret microphone that was fitted in a stand. The person who recorded the speech of the children was sitting beside each child and took care that the children did not change their position or that they did not touch the microphone or the stand. Children who could not read fluently repeated by ear. The total recording time per speaker was about 15 minutes. The recordings were classified by the author as good, acceptable, and unacceptable productions. The averaged energy values and spectrum lines of the good and acceptable productions were used to determine allowed spectral deviations for the vowels and fricative background pictures. During the training the actual spectrum lines must fall within the two spectrum lines in the speech picture, see Figure 8.10. 130 Design of visual feedback in Swedish computer-based therapy systems 8.4.3. Training method used in Box of Tricks The system is based on important steps of traditional speech therapy (Öster et al., 1999b, Vicsi et al., 2000). These steps are: ¾ Sound preparation: • Loudness • Pitch • Rhythm • Spectrum • Voicing ¾ Sound development: • articulation pictures • isolated phonemes • syllables with the target phoneme in all positions • repetition of syllables ¾ Sound sequences in words, minimal word-pairs and sentences: • fricatives can be practised at the beginning, in the middle or at the end of a word • vowels can be practised in monosyllabic or polysyllabic words ¾ Intonation exercises ¾ A user manager gives the possibility to • set the children’s data • describe their speech problems • document their improvements. 8.4.4. Type of visual feedback in different exercises Sound preparation In the sound preparation exercises, the child has the possibility to train different speech parameters like loudness, rhythm, sustained sound, voicing, and pitch. The aim is to get the child used to paying attention to the screen and be aware of important speech parameters. The child can see how a figure moves further up when the loudness increases, or keep the pitch or loudness steady for a short time to move a butterfly over a worm or for a longer time over a flower; the 131 Computer-Based Speech Therapy Using Visual Feedback neck of a duck moves up and down according to the child's pitch and so on. All the exercises are easy to understand and provide visual feedback for awareness in the form of animated graphics according to the child’s production. Figure 8.9 shows an example of pitch training with one of the sound preparation exercises. The neck of the duck moves up and down according to the child's pitch. The pair of cherries and the bee above them provide two targets for practice. Figure 8.9. A display of the Pitch module of the Sound preparation exercises. Sound development Among the sound development programs there is a possibility to study articulation pictures giving navigational feedback of correct position of the vocal organs during articulation of all phonemes that can be trained. Vowels and fricatives can be trained in isolation or in all positions in a syllable and short units. Frequency spectrums are shown on the screen, which are calculated and displayed every 20ms. In the “Isolation” exercise the form of the spectrum depends on the phoneme that is being trained. When correct pronunciation is attained, the spectrum falls inside a path that represents the lower and the upper limit of the spectral envelope for an acceptable pronunciation, based on the child database described in section 8.4.2. Figure 8.10 shows examples of speech pictures representing target spectra of accepted isolated production of (ä) [Q˘] and (u) [¨˘]. The 132 Design of visual feedback in Swedish computer-based therapy systems display to the left shows a successful pronunciation of the vowel (ä) [Q˘] as the spectrum falls inside the path. However, in the display to the right the active spectrum does not match the stored model of (u) [¨˘]. The automatic feedback, here shown as a flourishing flower, is based on a distance calculation between the spectral components of the stored spectrum and the active spectrum and gives the child a rewarding feedback that is encouraging. It can also be varied to display a digit, showing the outcome from 1-5, where 5 is the best. Figure 8.10. Two examples of training results with the sound development exercise Isolation. The automatic feedback, in the form of a flower, is one of several variants of the encouraging evaluative visual feedback in the system, (see text for details). This type of feedback is an acceptable evaluative feedback. It ranks the performance of the child and shows when the child is on a winning streak. The use of automatic feedback provides opportunities for children to practise with the system alone or at home with parents. By using background pictures for comparison with a correct model this module also supplies a comparable navigational feedback. The calculated average spectrum patterns of sibilants based on the child databases for each of the four languages are presented in Figure 8.11. 133 Computer-Based Speech Therapy Using Visual Feedback Figure 8.11. Average spectra of sibilants used for the four languages represented in Sampa symbols, see Appendix 2.. The background pictures in the “Syllable-training” exercise in Figure 8.12 emphasize the energy measured in each frequency band of a spectrogram. The child has to focus on the segmented part containing the background picture illustrating the target phoneme for training, in this example /s/ as in (sun). Figure 8.12. Two examples of training results with the sound development exercise Syllable-training. 134 Design of visual feedback in Swedish computer-based therapy systems The task for the child in the case of the sibilants is to cover the clouds, see Figure 8.12, in the lower part of the display with dots representing spectral energy, but not to cover the other parts of the symbolic background picture. In Figure 8.12 two displays of the training of an /s/ in the syllable (is) are shown. In the display to the left, a sibilant is produced in which the noise frequency is too low, which can be seen in the lower portion of the screen. The upper portion of the screen shows the model of the reference speaker. The background picture of the sun is covered with dots and the result is consequently 1 point. In the display to the right, a correct pronunciation is shown. The child’s production in the lower part of the display looks very similar to the correct model in the upper part of the display and the automatic feedback shows 5 points. In Figure 8.13 spectrograms of the nine long and three short Swedish vowels that can be trained are shown. Figure 8.13. Spectrograms of the Swedish vowels used in Box of Tricks. The spectrograms form the basis of the symbolic background pictures shown in Figure 8.14 that are used as targets in the vowel support menu. The task for the child is to cover the boxes representing the lowest formants, and not to cover the symbolic background picture representing the vowel. 135 Computer-Based Speech Therapy Using Visual Feedback Figure 8.14. The symbolic background pictures of the Swedish vowels used in Box of Tricks. Figure 8.15 shows an example of how the symbolic vowel speech picture for /i/ is used in a word training layout with a rewarding feedback that is encouraging (a moving duck that opens his mouth). Figure 8.15. A good result of a child training the vowel /i/ in a single word. The feedback consists of an encouraging rewarding feedback as well as a comparable navigational feedback (comparison with a correct model). 136 Design of visual feedback in Swedish computer-based therapy systems Sound sequences When it comes to training the phonemes in words, minimal wordpairs and sentences, all phonemes can be trained in different positions and contexts. Fricatives are presented in CV, VCV, VC and VC-VC-VC positions and connected with all vowels. In the Fricative Support, all phonemes are presented in initial, medial and final position, and in the Vowel Support, all phonemes occur in one syllable and polysyllable words. Minimal word-pairs are used to train differences between two phonemes in similar words by visual speech pictures. Sentences containing all phonemes, from simple to more complex are also available for training. In all these exercises an acceptable evaluative feedback is used. It ranks the performance of the child and shows when the child is on the winning streak. It can also be encouraging rewarding as it reinforces any improvement. The use of an automatic feedback provides opportunities for children to practise with the system alone or at home with parents. By using background pictures for comparison with a correct model a comparable navigational feedback is also used. The exercises work together with recognition through a phoneme-based comparison of the child’s production with the pronunciation of the reference speaker. 8.5. OLP (Ortho-Logo-Paedia) Therapy 8.5.1. Introduction The OLP-therapy method was developed within an EU Quality of Life and Management of Living Resources project, coordinated by the Institute for Language and Speech Processing, Athens, Greece, with participation from France (Arches), Greece (Altec S.A. and Logos Centre for Speech-Voice Pathology), Spain (Universidad Politécnica de Madrid), Sweden (KTH), and the UK (Sheffield University and Barnsley District General Hospital). Three basic types of pathologies were addressed in the OLP project. These were clients with dysarthria (English language), cleft palate (Greek language) and children with hearing impairments (Swedish language). This organization provided the project with a natural division of client 137 Computer-Based Speech Therapy Using Visual Feedback groups into languages and pathologies, thus enabling each clinical partner to concentrate on one particular pathology. The project aimed at improving the quality of life of persons with articulatory impairments by applying a new technological aid to support (not replace) traditional speech therapy at the level of articulation, and making it available over the Internet. The therapy method integrates automatic speech recognition technology based on the child’s best productions, and the therapy is tailored specifically for each child. OLP follows standard therapy design through therapy schedules of levels of therapy, which are specified based on knowledge of therapy design needed for each target client group. For Swedish prelingually and severely hearing-impaired children, the levels are specified in isolated vowels, syllable, repetition of same syllables, repetition of different syllables, monosyllabic words, polysyllabic words, and short phrases. Different types of exercises with a choice of technological tools are available at the various levels. Table 8.2 shows the available tools that are integrated in the OLP system and the tasks that are executed through them. Table 8.2. Function of available software tools and their use in different tasks. (Protopapas, A., 2004, User’s Manual OLP document QL1971-ILSIN-C-097-a3). Tool Recorder Trainer Recogniser 138 Functions Records sound from the microphone saving the recordings as audio files Tasks Record words for assessment. Record single sounds to train phonetic maps. Record words to train recogniser. Creates a speech recog- Train syllable recogniser. niser based on a set of Train word recogniser. recorded sounds Matches sounds from the Evaluate the production microphone to a set of of a syllable and a word “known” words learned using graphical displays. previously during Evaluate production of “training” and displays one command word the result graphically in from a word set relative the context of a “game”. to confusable others. Design of visual feedback in Swedish computer-based therapy systems STAPtk Support the user in recording sounds, labelling them, designing, creating, and using phonetic maps. Pitch display Displays graphically the pitch of a voice detected in the microphone. Loudness display Displays graphically the loudness of whatever sound is picked up at the microphone. Recogniser Supports the user in setconfiguration ting options for using the tool recogniser tool. Sound Displays and selects from selector available recordings according to criteria specified by the user. Label recorded sounds. Design a phonetic map display. Train a phonetic map. Display degree of match between spoken sound and a set of targets on a phonetic map. Evaluate ability of the client to produce a rising pitch contour. Evaluate ability of the client to produce a steady pitch. Evaluate ability to sustain phonation and to produce rising loudness contour. Transfer recordings made with the OLPy Recorder to be used with phonetic maps (STAPtk). 8.6. System components The OLP therapy system consists of four components: the user interface (OLPy) containing graphical interfaces for training with automatic word recognition, OPTACIA that uses the software component (STAPtk) for creating and using phonetic visual maps, the automatic speech recogniser (GRIFOS) and the functionality of remote administration (TELEMACHOS). 8.6.1. The user interface OLPy OLPy is the part of the OLP system that keeps track of clients, schedules, tasks and exercises. It communicates with the system database where everything is stored, and organizes the appropriate tools when tasks are to be carried out. There is a structural design in 139 Computer-Based Speech Therapy Using Visual Feedback OLPy that requires the therapist to follow certain steps to create and configure a schedule consisting of different tasks of interest for the child to train. Figure 8.16 shows the main OLPy window showing the therapist’s and child’s names, and a description of the tasks that are involved in added exercises. Exercises must be configured by the therapist who sets up the parametrically adjustable features of the individual tasks that belong to an exercise. Figure 8.16. The main OLPy window showing the child’s name and schedule. A library that supports therapy design for the specified client group is included in the system with the option of inserting new words. This wordlist contains all Swedish vowels in mono- and polysyllabic words (at least three examples of each) as well as all Swedish consonants in all possible positions (initial, medial, final) followed by both rounded and unrounded vowels if applicable. Since all tasks are performed by several different software tools there are many configuration windows in OLPy. Figure 8.17 shows one of the configuration windows for a recorder task using the recorder tool. 140 Design of visual feedback in Swedish computer-based therapy systems Figure 8.17. Recorder configuration window with existing library to the left and the option of inserting new words to the right. 8.6.2. OPTACIA OPTACIA utilizes the software STAPtk to create and use phonetic visual maps and was developed from the Optical Logo-Therapy OLT, (Hatzis, 1999; Hatzis et al., 1999; Hatzis & Green, 2001). Optacia visualises phonetic contrasts between sounds and provides real time audio-visual feedback through a tailored acoustic-to-articulation kinematic mapping in 2D. OPTACIA is based on three basic, wellfounded treatment principles: visuomotor tracking, visual contrast feedback, and visual reinforcement. Visuomotor tracking (Ziegler, Vogel, Teiwes, and Ahrndt, 1997) is a special case of biofeedback where some dynamic physical measure of performance is portrayed visually in real time. Visual contrast feedback is the important contrast between the correct and the misarticulated produced sound pattern that gives the children opportunities to become aware of differences in various articulatory configurations (Öster 1996). Visual reinforcement can support the child to increase the rate of response. 141 Computer-Based Speech Therapy Using Visual Feedback The aim of the therapy using OPTACIA is to strengthen, establish, and hopefully maintain correct articulation. The therapist has to record the sounds of interest to create the target for training, manually label the utterances, train the map, design and save the map, or select a predefined one. The articulator configuration corresponds to map position and articulator movement corresponds to map trajectory. In this way an individual child is provided with real time visual feedback about her/his speech. It is possible to re-use data collected during therapy sessions to re-train the map. Figure 8.18 shows a designed map during training of the front rounded Swedish vowel [¨˘] (u) and the back rounded Swedish vowel [u˘] (o). The targets are shown by squares and the child’s productions by circles. It can be seen in the figure that the child is making progress to the targets. This type of feedback is criterion-referenced navigational. Figure 8.18. A visual map during training designed to correspond to correct production of the front rounded Swedish vowel [¨˘] (u) and the back rounded Swedish vowel [u˘] (o). The targets are shown by the small squares and the child’s productions by the circles. The technique was tested with the speech of three hearing-impaired subjects for Swedish fricatives (Öster et al., 2003). The results 142 Design of visual feedback in Swedish computer-based therapy systems demonstrated the utility of the mapping techniques for visually portraying correct and deviant pronunciations and indicated the potential use in actual clinical practice with hearing-impaired children. To produce the map, it was necessary to have training data, in the form of recorded speech time-annotated wave-files with target labels of those speech sounds, which are to appear on the map. The design of the Swedish sibilant fricatives map with acoustic and phonetic information of the target sounds is shown in Figure 8.19. The phonetic symbols are presented in the Speech Assessment Methods Phonetic Alphabet (SAMPA), which constitutes the best robust international collaborative basis for a standard machinereadable encoding of phonetic notation (see Appendix 2 for more details). The acoustic difference between the three Swedish sibilants is represented by the vertical axis, which shows frequency range. By the positions of the fixed points along the axis the relationship between the acoustics and the articulation of each Swedish sibilant fricative is demonstrated. Articulatory targets for training each sibilant together with either a front spread [i:] or a back rounded [u:] in syllables are also inserted in the map, providing visual paths during training. Figure 8.19. Design of the map for Swedish training of the sibilant fricatives in Sampa symbols. The data files were recorded and manually labelled in the Wavesurfer application, http://www.speech.kth.se/wavesurfer/ (Sjölander & Beskov, 2000). The recorded speech of three profoundly 143 Computer-Based Speech Therapy Using Visual Feedback hearing-impaired children served as test material for the map. The children (one girl and two boys) were 16 years of age and belonged to a school for the deaf in Stockholm. The children were recorded when they repeated CVC combinations. All the produced combinations of (sis) [si˘s] and (sos) [su˘s] were selected and played back to the map. The results of two of the children (one girl and one boy) can be seen in Figure 8.20. Figure 8.20. The result of two hearing-impaired children’s repetitions of (sis) and (sos) played back to the map. The left panels are (sis) and the right panels are (sos). The upper panels are from the girl and the lower ones from the boy. The upper left panel shows the girl’s production of /sis/. In this production, she is quite successful at producing the fricative, but the vowel maps onto the [u:] instead of the intended [i:]. This could be the result of an incorrect [i:] production which includes lip-rounding 144 Design of visual feedback in Swedish computer-based therapy systems and strong nasalization. In the upper right panel, the same girl has intended a pronunciation of /sus/ but here the map indicates the instability of her fricative. Her /u/ vowel is correctly pronounced. In the lower left panel, the result of the boy’s /sis/ pronunciation is shown. He has more difficulty with the fricative /s/ than the girl, but he comes closer to an acceptable pronunciation of the /i/ vowel. Finally, in the lower right panel, the result of the boy’s /sus/ is shown. In this attempt, his first /s/ is quite successful, as is his vowel. It is the /s/ in final position which he has difficulty with. The results for the other boy were not as clear. As he had low intelligibility and a very indistinct articulation, the map showed no targets or trajectories. 8.6.3. GRIFOS GRIFOS is a speaker-dependent, small-vocabulary, automatic speech recognition system (based on HMM models) used in OPTACIA sessions and in the graphical user interfaces of OLPy. The acceptability of a child’s speech productions is evaluated to a trained target during speech therapy. The recogniser is used together with animated graphics for an amusing exact evaluative feedback as well as a rewarding feedback. The rewarding feedback can either be hitting the mark as it rewards a correct produced word or syllable and considers all productions that differ from the target as incorrect, or be encouraging and reward a produced word or syllable according to the degree to which the child’s production matches the target by presenting the progress visually. Figure 8.21 shows an example of one of the different graphical interfaces, “the Diver” that is used for an abundant and meaningful training. A production matching the target model will cause the diver to swim up to the rock and pick a jellyfish. When the predetermined number of hits is reached, the diver swims out of the screen. A target bar is shown at the bottom of the interface. When pressing the button with the little head the child’s target model is replayed. This model is made up of the child’s best productions. A colour bar shows the number of attempts configured in the exercise. 145 Computer-Based Speech Therapy Using Visual Feedback Figure 8.21. The graphical user interface “the Diver” used together with word recognition. A production of the word “apple” is recognised as correct and the diver picks a jellyfish. 8.7. Conclusions This chapter has reviewed many different types of visual feedback that are used in computer-based speech therapy today to motivate and assist speech and hearing-impaired children during speech training. The systems are product-oriented as they give parametrical feedback by showing visual representations of acoustical parameters in real time. The feedback can be of different kinds and purposes: for basic awareness and control, for evaluation, for navigation and guidance, for rewards, for repetitive and additional training, and for auditory playback and sound effects. The exercises consist of animated graphics that change in colour and form, game-like exercises, comparable models of spectra, spectrograms, and oscillograms, target speech pictures, and visual maps. All systems are technically well developed and contain several programs to choose from, as well as many efficient types of visual feedback. However, some of the programs are more useful for children and hearing-impaired persons than for adults and hearing persons and vice versa. Moreover, some of the visual feedback types are more efficient during certain stages of the therapy than others. 146 Design of visual feedback in Swedish computer-based therapy systems These facts are brought up in the clinical evaluations of the systems reported in the following chapter. Recommended visual feedback strategies and therapy design for different users are discussed in Chapter 10. 147 Clinical evaluation studies of the three systems 9. Clinical evaluation studies of the three systems 9.1. Introduction This chapter reports on several evaluations of the three systems carried out within national projects as well as within EU projects. The SpeechViewer was evaluated for Swedish with hearing-impaired children and L2 learners through case studies. Evaluations of Box of Tricks were also made by the Hungarian and Slovenian partner in almost the same manner but the groups of children who participated in the evaluation differed between the partners. The OLP therapy and OPTACIA were evaluated by the Swedish, Greek, and British partners through an AB/BA design. The Swedish subjects consisted of two groups of hearing-impaired children. 9.2. Clinical evaluation of SpeechViewer with profoundly hearing-impaired children In several pre-schools and schools for deaf and hard-of-hearing children in Sweden, the SpeechViewer has become a standard and valuable complement to the regular speech training activities. This is in part due to an ongoing collaboration between the therapists and the department of Speech, Music and Hearing at KTH. The positive effect of a systematic training of prosodic contrasts was shown in a study by Öster (1989a, 1989b, 1990). Two prelingually profoundly hearing-impaired children with difficulties in producing certain phonological contrasts in Swedish were trained systematically with the system during eight weeks in order to evaluate its efficiency. One of the children (child I) was a 15-year-old boy, with some residual hearing in the low frequencies. He had difficulties in producing quantity differences between phonologically long and short vowels. This is an important contrast in Swedish, which is 149 Computer-Based Speech Therapy Using Visual Feedback realised as a difference in duration and, for some pairs, also with a difference in vowel quality. Furthermore, the following consonant has opposite quantity, i.e., long vowels are followed by short consonants and vice versa. The child controlled only the phonological contrast between long and short (o) [u˘ - U] before training. His realisations of short (å) [ç] and short (u) [P] were always produced long. In some cases he did not control the vowel quality. It was not the intention to train vowel quality in this study, but in some cases the pronunciation improved or became more stable after the durational training. Child II was a 13-years-old girl, who had difficulties in producing distinctions between voiced and voiceless velar stops. Her hearing threshold was within the region of vibration. The children were video recorded before and after training when they read minimal word pairs containing the phonological contrasts of vowel-length and voicing. Short sentences in which the target words were included were also recorded to study the pronunciation of the words in isolation compared to the pronunciation of the words in running speech. Untrained word pairs that contained the same contrasts were also recorded to study the generalisation effect. Narrow phonetic transcriptions of the video recordings were made by the author using the symbols of the International Phonetic Alphabet and some diacritical marks (see Appendix 1) that have been developed for the transcription of babbling and phonetic development in early infancy (Bush et al., 1973; Grunwell, 1987; Roug, Landberg, and Lundberg, 1989). The children were trained with the speech patterning program "Pitch and Loudness" by their therapists for about ten minutes twice a week during eight weeks. This is a graphical presentation of the speech signal where the voiced/voiceless contrast is clearly indicated by different colours. Voiced sounds are red and voiceless sounds are green. Discrimination between long and short vowels is visible through the differences in duration of the red colour that indicates voiced vowels. Otherwise all vowels look the same. In Figure 9.1 the display of the speech-patterning program "Pitch and Loudness" is shown. In the upper portion the teacher’s pattern of the word pairs (haka) [A˘] (chin) and (hacka) [a] (chop) is shown and in the lower portion the child’s production before training is shown. It can be 150 Clinical evaluation studies of the three systems seen that the child pronounced the long quantity for both long and short /a/. It was easy to explain to the children what was deviant in their production by comparing their speech with the speech of the therapist on the split screen display. Figures 9.2 and 9.3 show the subjective assessment (done by the author) of correctly produced quantity of long and short vowels by Child I before and after training. Figure 9.1. Display of the speech-patterning program "Pitch and Loudness". In the upper portion the teacher’s pattern of the word pairs (haka) [hA˘ka] (chin) and (hacka) [haka] (chop) is shown and in the lower portion the child’s production before training is shown. It can be seen that the child only mastered the long quantity of the vowel /a/. The result shows that, in spite of the limited amount of data, the child learned to produce the short versions correctly after training. Short vowels improved more than long vowels. An improvement in the production of durational contrast between vowels after systematic training was found in all vowels except for /y/. An improvement was also found in running speech and in untrained words, similar to that found in trained words. Child II also improved in producing voicing contrasts with the help of the SpeechViewer training. Before training, (g) was pronounced without voicing in medial position. In final position, (g) was omitted. 151 Computer-Based Speech Therapy Using Visual Feedback Correctly perceived quantity of long Swedish vowels % 100 90 80 70 60 50 before training 40 after training 30 20 10 0 n= O 2 Å 5 A 8 I 8 E 4 Y 2 Ö U 4 2 MEAN Figure 9.2.Subjective evaluation of correct duration of Swedish long vowels before and after training. The vowels are represented by orthographic symbols. Correctly perceived quantity of Swedish short vowels % 100 90 80 70 60 50 before training after training 40 30 20 10 0 n= O Å A I E Y Ö U 2 5 8 8 4 2 4 2 MEAN Figure 9.3. Subjective evaluation of correct duration of Swedish short vowels before and after training. The vowels are represented by orthographic symbols. 152 Clinical evaluation studies of the three systems However, after training, (g) was pronounced as a voiced velar stop in both medial and final positions. This child also produced the differences in voicing between velar stops when she read the words in isolation as well as when she read the words in running speech. Similar to Child I, the improvement in trained words was also found in untrained words. SpeechViewer was also introduced in the pre-school training of the Danderyd Hospital in Stockholm. The ambition was to take care of early skills during the sensitive period for learning and individually train such abilities as respiration, loudness, pitch and voice quality before the children join the speech clinic of a school for deaf children. Results were reported (Öster 1995c, 1995d 1996) of a five-year old prelingually deaf boy (D), who used sign language for communication. His pure-tone averages were 78 dB in the right ear and 102 dB in the left ear. His phonation was too high and monotonous around 700 Hz, which he could not perceive or control by himself. This was very disturbing for those closest to him and he was constantly reduced to silence and faced with irritation. His teacher wanted him to learn and establish a natural pitch with a voice he could make use of. During training D wore his hearing aids. By way of introduction it was important to get D to understand that what happened on the screen was dependent on his phonation and pitch. We started the training with one of the simple pitch awareness modules to increase awareness of vocal pitch and quantify his pitch range. A helicopter changes position vertically as pitch variations occur and two white marks indicate minimum and maximum pitch attained, see Figure 9.4. D’s pitch was high but his range was small: 688-756 Hz. D observed the helicopter on the highest floor but he was unable to lower it. The feedback made him aware of his vocal pitch but it did not show him how to lower it. Instead we used the “Pitch and Loudness” patterning program, where time is represented on the horizontal axis and pitch along a vertical frequency scale in Hz. The split screen in Figure 9.5 displays the teacher’s input of a sustained /A:/ with a natural pitch in the upper portion and D’s phonation during 3.8 sec. with a mean of 604 Hz in the lower portion of the screen. 153 Computer-Based Speech Therapy Using Visual Feedback Figure 9.4. Display of the module “Pitch” where D’s pitch and range are shown. Figure 9.5. Display of the speech patterning program “Pitch and Loudness”. In the upper portion the teacher’s phonation during 4.4 seconds is shown. In the lower portion D’s sustained spontaneous phonation during 3.8 seconds is shown. The next step was to get D to vary his pitch. He touched the teacher’s larynx and observed her pitch variation in the upper part of the screen many times over. In the lower part of the screen in Figure 9.6 the positive result of D is shown during 5 seconds. The third step was to get D to lower his pitch. Figure 9.7 shows, in the upper part of the screen, the pattern of the teacher when she repeatedly lowered her pitch at the same time as D 154 Clinical evaluation studies of the three systems touched her larynx. The lower part shows that during the first 3 seconds D’s pitch was very high but suddenly it dropped to 327 Hz. He was stunned and went back to the high and varied pitch pattern to get control over his voice. From now on D varied between high and low pitch and at a given sign by the teacher he immediately lowered his pitch. Figure 9.6. Display of the pitch variation training with the speech patterning program “Pitch and Loudness”. Figure 9.7. Display of the training to get D. to lower his pitch with the speech patterning program “Pitch and Loudness”. 155 Computer-Based Speech Therapy Using Visual Feedback Figure 9.8 shows a typical training session of repeated phonation with a natural pitch. To vary the training and strengthen his control of pitch, D also tried skill-building programs like the one in Figure 9.9, where he had to produce correct pitch variations to control the vertical movements and sustained voicing to control the horizontal movement of an object toward targets arranged in a curve. Figure 9.8. Display of a typical training session of sustained and repeated phonation with a natural pitch. Figure 9.9. Display of the skill-building program “Pitch”. D. monitored the figure with his voice towards the gold pieces by varying his pitch between 275 and 325 Hz. 156 Clinical evaluation studies of the three systems Figure 9.10 shows that D learned and established a lower pitch more natural for his age, and that he got awareness and control over his voice. The upper portion of the screen shows the spontaneous phonation two weeks after D finished training. The average pitch was 266 Hz. The lower portion of the screen shows his spontaneous phonation eight weeks after training. The average pitch was 263 Hz and his voice was a soft, nice voice of a five-year old little boy. Figure 9.10. The upper portion shows D’s spontaneous phonation two weeks after finishing his training and the lower portion shows his spontaneous phonation eight weeks after finishing training. 9.3. Clinical evaluation of SpeechViewer with L2 learners In three studies by Öster (1997, 1998, 1999a) strategies and results from a project together with the Unit for Languages and Educational Research and Development at KTH were reported where this new L2-teaching strategy was used with 13 adult international engineers (9 males and 4 females) learning technical Swedish. The learners participated in a total of six half-hour training sessions twice a week, training both perception and production of spoken Swedish. All of them but one had an academic degree in engineering from a university in their home country. Their ages varied from 25 to 46 years. Six of them originated from Bosnia, the others from Cuba, Peru, Saudi Arabia and Russia. The Speech Patterning Module “Pitch 157 Computer-Based Speech Therapy Using Visual Feedback and Loudness” was used and some of the learners also used “Skill building modules. Before training, the speakers were recorded when they read a text of 17 sentences and 110 isolated words. This diagnostic speech material contained all Swedish vowels (long and short) and all Swedish consonants in initial, medial and final positions. Two trained phoneticians transcribed the recorded words and an assessment of each L2-speaker´s individual segmental deviations was made. The recorded sentences were evaluated by the phoneticians according to the speakers´ accuracy in producing stress patterns, intonation, word accent and rhythm. The segmental and suprasegmental deviations found formed the basis for each speaker’s individual speech material in the pronunciation training that followed. By means of the audio-visual feedback given in the module “Pitch and Loudness”, efficient training of both perception and production of spoken Swedish was viable. It was easy to make the learners aware of in what way their production deviated and show them how to correct specific deviations. In the following two figures (9.11 and 9.12) the visual information that this module provides can be studied. Figure 9.11. The upper portion shows the teacher’s model of a correctly produced voicing contrast between the initial plosives /p/ and /b/. The lower portion shows a deviant production of a Bosnian speaker. 158 Clinical evaluation studies of the three systems The program displays F0 and intensity in real time. The split screen provides a comparison of a learner’s utterance with a model of the teacher. In Figure 9.11, the voiced and voiceless contrast is clearly indicated by different colours in the loudness application of the module “Pitch and Loudness”. Voiced sounds are red (dark) and voiceless sounds are green (light). In the upper portion, the teacher’s correct production of the word pairs /puss/ (kiss) and /buss/ (bus) is shown and in the lower portion a Bosnian speaker’s production is shown. It can be seen that the Bosnian speaker has difficulties in producing the phonetic contrast between the initial unvoiced aspirated plosive and the voiced plosives in Swedish. Figure 9.12 shows a training session of the intonation pattern of the Swedish sentence “Karin sjunger i sängen” (Karin sings in bed) in the application where fundamental frequency and intensity are displayed in real time. Stressed syllables are visible on the teacher’s screen through pitch modulation, longer duration and higher intensity (thickness of the line). Figure 9.12. In the upper portion, the teacher’s correct intonation pattern of the phrase “Karin sjunger i sängen” (Karin sings in bed) is shown. In the lower portion, a Bosnian speaker’s intonation pattern is shown. On the lower screen, a Bosnian speaker’s quite monotonous production is shown. The contrastiveness of the visual patterns gives 159 Computer-Based Speech Therapy Using Visual Feedback the learner possibilities to discriminate between various distinctive features that underlie phonological contrasts in Swedish. While the specific structure of spoken L2-teaching may vary somewhat from teacher to teacher there are certain general steps to be followed to obtain efficient training. The following steps, earlier discussed in section 2.3.5 are recommended: • • • • 9.3.1. Diagnosis of individual deviations Instructions that aim at awareness and correct realisation Initial training to obtain automaticity Additional and repetitive training for generalisation and transfer to linguistic use Diagnosis of individual deviations The first step involves an assessment of individual deviations that should be trained. Examples of individual results of a phonological assessment (Öster 1997) of segmental deviations by a group of six Bosnian and four Spanish (Cuba, Peru) speakers are shown. Deviantly produced Swedish vowel quality is shown in vowel diagrams and deviantly pronounced consonants are inserted in a table that shows applicable positions. A summary of general prosodic deviations is also shown. Vowels Displaying vowel deviations in diagrams was a good pedagogical aid to describe the role of the tongue in vowel production. The Swedish language has 9 phonologically long and 9 short vowels. As previously stated, quantity is an important contrast in Swedish, which is realised as a difference in duration and, for some pairs, also with a difference in vowel quality. Furthermore, the following consonant has opposite quantity; i.e. long vowels are followed by short consonants and vice versa. High and mid-high long Swedish vowels are diphthongised in open syllables. Noteworthy is also the number of Swedish front vowels and the fact that some of them are rounded. It is often difficult for L2 speakers to have a complete productive and perceptual mastery of these vowels (Bannert, 1990). The diagrams in Figures 9.13 and 9.14 show typical vowel confusions 160 Clinical evaluation studies of the three systems and difficulties that the L2 learners had depending on their mother tongue. Figure 9.13. Deviantly produced Swedish vowel-quality by six Bosnian speakers. Lines and arrows indicate typical confusions. Figure 9.14. Deviantly produced Swedish vowel-quality by four Spanish speakers. Lines and arrows indicate typical confusions 161 Computer-Based Speech Therapy Using Visual Feedback Consonants Deviantly produced consonants are inserted in tables that show all possible positions. Figures 9.15 and 9.16 show the deviations made for those consonants in initial, medial and final wordposition (cf. section 7.3.2). Blank cells indicate that the consonant was pronounced correctly. Shaded cells indicate positions where the consonant was not phonotactically possible. The consonant deviations that were trained depended on the mother tongue of the L2 speakers. Some of the Swedish consonants caused production and perception difficulties for the L2 speakers, especially the voicing contrast between plosives, aspirated voiceless plosives, the great number of fricatives, the nasal velar sound and the fact that dentals are retroflexed when they are preceded by /r/. Figure 9.15. Deviantly produced Swedish consonants by six Bosnian speakers. 162 Clinical evaluation studies of the three systems Fig. 9.16. Deviantly produced Swedish consonants by four Spanish speakers. Prosody Prosodic deviations were evaluated and summarized by two phonetically trained persons when listening to the recorded text. The most common prosodic deviations made by both groups of L2 speakers were: • Perceiving and producing accent II • Producing a long consonant after a short vowel in stressed syllables • Producing quantity differences between phonologically long and short vowels. 163 Computer-Based Speech Therapy Using Visual Feedback 9.3.2. Instructions that aimed at awareness, correct realisation, and understanding It was important to make the learner aware of in what way his/her production deviated and show him/her how to correct these deviations. Instruction was based on this possibility of comparison in the module “Pitch and Loudness”. The most instructive speech material consisted of minimal word pairs that contained the contrast that the learner produced deviantly. When studying the following figures it is obvious that this contrastive training using audio-visual feedback had a certain effect on the L2 speakers' productions. Figures 9.17 and 9.18 show positive results after six half-hour training sessions twice a week of the same material as shown in figs 9.11 and 9.12 (same speakers). Figure 9.17. The upper portion shows the teacher’s model of a correctly produced voicing contrast between the initial plosives /p/ and /b/. The lower portion shows a correct production by a Bosnian speaker after some training (cf. Figure 9.11). 164 Clinical evaluation studies of the three systems Figure 9.18. The upper panel shows the teacher’s correct intonation pattern of the phrase “Karin sjunger i sängen” (Karen sings in her bed). In the lower panel, a Bosnian speaker's improvement after training is shown (cf. Figure 9.12). 9.3.3. Further training to establish automaticity and transfer to untrained material Skill building modules were also used which provide a game-like strategy to strengthen ability in refined articulation. Phonemes produced by the learner were matched against models by comparing target spectra. To avoid the problem of interspeaker variability the L2-speaker's own “best production” was stored as a target. 9.3.4. Results of a questionnaire with the thirteen L2 speakers After the training period the L2 speakers were asked some questions concerning this new L2 teaching strategy. In the following figure some of their answers are summarised (Figure 9.19). 165 Computer-Based Speech Therapy Using Visual Feedback Have you learned something? What is your opinion of the training? very much excellent much very good a little indifferent nothing don´t like it no opinion no opinion 0 1 2 3 4 5 6 7 8 9 10111213 What is your opinion of the system? 0 1 2 3 4 5 6 7 8 9 10111213 Do you speak more Swedish now? excellent much more very good more indifferent a bit more don´t like it no more no opinion no opinion 0 1 2 3 4 5 6 7 8 9 10111213 0 1 2 3 4 5 6 7 8 9 10111213 Are you more aware of the pronunciation now? much more more a bit more no more no opinion 0 1 2 3 4 5 6 7 8 9 10111213 Figure 9.19. Results from a questionnaire with thirteen L2 speakers after some experience of audio-visual speech training provided with the SpeechViewer. 166 Clinical evaluation studies of the three systems Below some important comments made by the L2 speakers after training with SpeechViewer are summarised: • • • • • • • • • "I feel more confident now" "I realise the importance of a good pronunciation" "Now I immediately notice by myself when I pronounce something deviantly" "I am listening to other people more than before, trying to imitate their pronunciation" "Nowadays I am aware of my pronunciation" "I think that I talk more like a Swede now" "I talk more slowly now" "I want to train a lot more. My Swedish is not automatised" "It seems as if I have lost my identity" 9.4. Evaluation studies with Box of Tricks This system was developed for speech- and hearing-impaired children aged 4-10 within the EU project “SPECO” between 1998 and 2001. The system was developed for four languages: Hungarian (Varázsdoboz), English (Box of Tricks), Swedish (Trollerilådan), and Slovenian (ARTI), see section 8.4.1. The main objective of the project was to develop a user-friendly system through the combination of prerecorded training vocabularies, comparable targets of a reference speaker, symbolic speech pictures for each phoneme, and illustrated training words. During the development of the system, all partners were keeping close contacts with speech therapists from different educational fields, asking them repeatedly for their opinion. When the program and the User Manual were finished it was given to the therapists for evaluation. Two types of evaluations were made. One was based on the therapists’ answers from a questionnaire that was constructed and used for all four languages. The therapists answered the questionnaire after having worked with the program from three months to half a year. Another clinical evaluation was made by the Slovenian and the Hungarian partner. 167 Computer-Based Speech Therapy Using Visual Feedback 9.4.1. Results of the Swedish questionnaire The children who were trained with the program were between 4 and 12 years of age and consisted of three groups: hearing-impaired pre-school children, profoundly hearing-impaired children from a school for the deaf in Stockholm, and normal-hearing children with central language impairment. The evaluation was based on the answers of eight speech therapists who worked with the system during three months. The therapists had among other things experienced that: • “The program supplies a long-felt need. There are so many computer-based language-training programs but so few speechtraining programs on the market” • “The sound preparation exercises attracted the children the most” • “The children were very curious and wanted to see all the illustrations before starting therapy” • “The auditory feedback of their own pronouncing stimulated them a lot” • “Even if the children did not understood the acoustic pattern they got an understandable feedback by comparing their pattern with that of the reference speaker” • “In the beginning some children had problems in focussing on the most important things” • “Most children were very interested and co-operated” • “Every time they came to school they wanted to work with the program. The program never failed. It was technically extremely stable” • “Both the visual and auditory feedback worked without any problem. The program seems to be very well thought-out and carefully planned” • “Using symbolic cards illustrating each speech-sound was a good idea” • “Some parts of the program depending on the maturity and ability of the child” • “Nice colours, and the illustrations are beautiful and clear” 168 Clinical evaluation studies of the three systems • “If the child is motivated the program seems to be very effective” • “In the beginning the program was difficult to grasp” • “The program is very big and sometimes it was difficult to find the way to the accurate exercise” • “The more I work with the system the more I appreciate it” • “I especially valued the fixed vocabularies and the reference speaker” For a full report about the Swedish questionnaire see Appendix 3. 9.4.2. Results of the Slovenian questionnaire Five therapists answered the Slovenian version of the questionnaire. All therapists were speech and language pathologists. Three of them had used the program for three months, one for five months and one for seven months. Sixteen children between the ages of 5 and 15 worked with the system. Five of them were speech-impaired, three were hearing-impaired, three were deaf, and five children had a cochlear implant. In general, the therapists were very satisfied with this method of speech therapy. Their opinion was that the period of evaluation was too short to make an evaluation that would give more than indicative conclusions. However, the method was considered to be very useful, especially as a supplement to traditional methods. It was a much appreciated modification in their work, pleasant for the children and useful for therapy of children with speech and/or hearing disorders. From a pedagogical point of view the system was highly rated. The user manual offered the therapists enough help and information. They were all of the opinion that there was no need for additional exercises. The children who had used the program showed a lot of interest and loved to work with it. They all understood the logic of the speech pictures and the background pictures and found all illustrations to be very amusing and colourful. In general, the opinion of the therapists was that the program stimulated the children to work for a longer time with each exercise and also motivated them to reiteration. 169 Computer-Based Speech Therapy Using Visual Feedback 9.4.3. Results of the Hungarian questionnaire The survey was concluded in “Török Béla” Hard of Hearing School in Budapest. Severely hearing-impaired children, hearing-impaired children, deaf children (sign language), children with cochlear implant and normal-hearing children with difficulties in understanding and learning speech participated in the evaluation. A summary of the therapists’ answers follows below: 1) Children can use this multimodal system by themselves very easily and they use it with pleasure, which is a very important factor from the point of view of efficiency. 2) The visual feedback helps children to see whether their pronunciation is correct or not, and how far it is from the correct one. They are not in need of relying only on the teacher's opinion. In particular, this is very important in case of speech handicapped with hearing loss. 3) In general, sounds were formed sooner than in control groups where they did not use any computer-based teaching system. It was found that a consistently shorter time was required for improving a speech sound than was the case with corresponding children of similar mental ability and impairment level who had been instructed by the traditional method. However, it is difficult to express the results in quantitative data because the result depended on many other factors (for example, one highly important factor was how much additional help the child received at home). 4) In those cases where these sounds were very resistant to the traditional therapy, the new method helped to repair these sounds. 5) The system is a useful tool for teachers in the individual linguistic training. It gives possibility to train in small groups too and pupils can use the system themselves and practice alone. Of course, especially for young children the visual tool itself does not substitute the work of the speech therapist. Firstly, this tool is a good aid. It helps the work of the therapist and gives a variety to teaching. Secondly, at the automation-phase or in case of older 170 Clinical evaluation studies of the three systems children, the visual tool itself gives a good possibility to practice alone. 9.4.4. Clinical evaluation of the Hungarian version of Box of Tricks The aim of the clinical evaluation of the Hungarian version, called the SPECO-method or Vara’zsdoboz, was to study whether the system was effective, if the intelligibility improved after training, and whether the children maintained a good pronunciation after therapy (Vicsi, 2001). The result of speech therapy with the Hungarian version of the system and with traditional therapy (without computer-based visual feedback) was compared using listening tests. Forty children between six and eight years of age were selected and grouped according to their degree of hearing impairment. Eight children had normal hearing, eight children had a mild hearing impairment, eight children had a moderate hearing impairment, eight children had a severe hearing impairment, and eight children (two from each group) constituted the control group (traditional therapy). All children except the control group constituted the test group. All children were recorded when they read eighteen words before therapy, immediately after therapy, and five months after therapy. Thirty persons, inexperienced with the speech of hearingimpaired children, participated in a listening test where the same word recorded at three different occasions was presented from each speaker. The words were presented as word-pairs before and immediately after therapy, before and five months after therapy, and immediately after therapy and five months after therapy. The listeners were asked to decide which word out of two was the most intelligible. The results of the listening tests are presented in Figures 9.209.22. The effect of therapy is evident. Most of the time the listeners found the words after therapy to be more intelligible than the words before therapy. The intelligibility was higher for those groups who had been trained with the Box of Tricks than for the control group who was only trained with traditional methods. 171 Computer-Based Speech Therapy Using Visual Feedback Most intelligibly pronounced word before and immediately after therapy % 100 90 80 70 60 50 40 30 20 10 0 before therapy tro lg ro up Te st gr ou p Se ve re Co n M od e ra te il d M N or m al immediately after therapy indecisive Figure 9.20. Effects of therapy with four groups of children. Results from a listening test by 30 inexperienced listeners, who judged words pronounced by the different groups of children before therapy and immediately after therapy (see text for more details). before therapy 5 months after therapy Se ve re tro lg ro up Te st gr ou p Co n ra te od e M or m N il d indecisive al 100 90 80 70 60 50 40 30 20 10 0 Most intelligibly pronounced word before and 5 months after therapy M % Figure 9.21. Results from a listening test by 30 inexperienced listeners, who judged words pronounced by four different groups of children before therapy and 5 months after therapy. 172 Clinical evaluation studies of the three systems % Most intelligibly pronounced word immediately after and 5 months after therapy 100 90 80 70 60 50 40 30 20 10 0 Se ve re Co nt ro lg ro up Te st gr ou p ra te M od e M il d N or m al immediately after therapy 5 months after therapy indecisive Figure 9.22. Results from a listening test by 30 inexperienced listeners who judged words pronounced by four different groups of children immediately after therapy and 5 months after therapy. When comparing the groups with each other it can be seen that the normally hearing children improved more than the other groups immediately after training with the system. The eight children with a moderate hearing impairment improved more than the other groups of hearing-impaired children after therapy. It is also obvious that the intelligible pronunciation obtained after therapy was established and maintained five months after therapy. 9.4.5. Clinical evaluation of the Slovenian version of Box of Tricks The Slovenian version called “ARTI” was evaluated in the Speech and Hearing Centre in Maribor, Slovenia. Thirty-one children between the ages of 5 and 15 years used the system for three months. The children were divided into four groups: 11 children were speechimpaired, 6 were hearing-impaired, 5 were deaf using sign-language, 173 Computer-Based Speech Therapy Using Visual Feedback and 9 children had a cochlear implant. Each group was further divided into a control (15 children) and a test group (16 children). The children in the test group were trained with the system and the children in the control group were trained with traditional methods, without computer-based therapy with visual feedback. A listening test was performed to evaluate the effect of speech therapy with ARTI. Recordings were made when each child read eighteen words before and after therapy. The listeners consisted of two types of listeners, naive and experienced listeners of the speech of hearing-impaired children. Eight naive listeners and seven speech therapists from the Speech and Hearing Centre in Maribor listened to the utterances and had to decide which pronunciation was the better of the same two words. The order of utterances for each word was randomly selected and the listeners were not aware which pronunciation was recorded before therapy and which was recorded after therapy. The result of the listening test for all listeners’ estimated improvements of the test groups, relative the improvement of the control group, showed that the deaf children improved to a higher degree, 18,65%, in intelligibility after therapy with the system than the hearing-impaired children, 8,25%, did, see Figure 9.23. The children with cochlear implant improved only with 0,1%, while the speech-impaired children did not improve in intelligibility after therapy with the system. They were judged to be less intelligible, 4,5 % after therapy with the system compared to the control group that only had traditional therapy without computer-based training with visual feedback. Figure 9.23 shows the result of the listening test of all listeners’, naive listeners’, and speech therapists’ choice of the best pronounced word, before or after ARTI-therapy, for the four different groups of children, compared to the result of the control group. The result of the listening test with experienced speech therapists shows that all hearing-impaired children who used the system improved in intelligibility compared to the children who were trained with traditional methods. The improvement was biggest for the deaf children using sign language. However, the results of the listening test with naive listeners shows that the children with cochlear implant did not improve after training with 174 Clinical evaluation studies of the three systems the system when compared to the control group. The results can be looked upon as indicative because the training phase was rather short and the children would probably have needed more time to adapt to the new method. Estimated improvements in intelligibility after ARTI-therapy relative improvements % after traditional therapy 25 all listeners naive listeners speech therapists 20 15 10 5 0 -5 -10 hearingimpaired deaf cochlear implant speechimpaired Figure 9.23. Result of the listening test by all listeners’, naive listeners’ and speech therapists’ estimation of best pronounced word, before and after ARTI-therapy, with four different groups of children compared to the result of the test group. 9.4.6. Clinical evaluation of the Swedish version of Box of Tricks A controlled clinical evaluation of the Swedish version has not yet been performed. The system is available as commercial software in Sweden and includes a training course on how to use the system in the most effective way. Thanks to these courses continuous and resumed contacts with the users have shown that the therapists as well as the children appreciate the system as it is user-friendly, amusing and helps the children in their speech acquisition. 175 Computer-Based Speech Therapy Using Visual Feedback 9.5. Clinical evaluation of the Swedish version of the OLP method with hearing-impaired children 9.5.1. Introduction In Sweden, therapy with the OLP method was compared to traditional therapy for children with hearing impairment. The evaluation gave details about the effectiveness, feasibility and accessibility of the OLP therapy method in the treatment of articulation disorders by hearing-impaired children. The OLP therapy method was designed to increase intelligibility and accuracy of articulation by providing rewarding, navigational, and encouraging visual feedback. 9.5.2. Method Two speech therapists from the Manilla Deaf School in Stockholm were enrolled in the program during the iterative development of the OLP therapy method. They practised selecting and running the exercises via the OLPy user interface, recorded their own speech and set up exercises for practise, tried to understand how to use the system with hearing-impaired children, reported problems and gave suggestions for improvement. When the OLP software was sufficiently developed, the OLP-prototype was clinically evaluated by the speech therapists with some of their children. Three important questions were provided in order to evaluate the OLP therapy method with hearing-impaired children: ¾ Is the OLP method of therapy effective in increasing the intelligibility of the speech of: • prelingually hearing-impaired children between 8 and 14 years with a moderate to severe hearing loss ? • prelingually hearing-impaired children between 8 and 14 years of age with a severe to profound hearing loss? ¾ What degree of transfer is there of speech skills to untrained speech material? • Does the OLP therapy method function in a real environment? 176 Clinical evaluation studies of the three systems 9.5.3. Subjects Eight children were recruited for the evaluation. The children ranged between 8 -14 years of age and their pure tone averages (PTA) in best ear were between 55–108 dBHL. Age, sex, and hearing data for the six children who finalized the evaluation are given in Table 9.1 and Table 9.2. All children used sign language for communication and fulfilled the specified inclusion/exclusion criteria. This implied that they all had good vision, cognition and reading age of over 8 years. Table 9.1. Age, sex, and hearing loss in the better ear for three children with a moderate to severe hearing impairment who participated in the clinical evaluation. Group B(OLP) A: Moderate to severe hearing impairment Hearing Loss in the better ear, dB Child Age Sex 1 2 3 9 F 11 F 11 M 20 45 40 80 95 PTA, dBHL 55 50 60 70 70 70 60 70 55 65 65 80 85 115 77 125 250 500 1000 2000 4000 Table 9.2. Age, sex, and hearing loss in the better ear for three children with a severe to profound hearing impairment who participated in the clinical evaluation. Group A B(OLP): Severe to profound hearing impairment Hearing Loss in the better ear, dB 14 M 70 55 70 70 105 105 PTA, dBHL 81 10 M 80 90 85 105 > > 103 8 F 85 90 100 > > 108 Child Age Sex 125 250 500 1000 2000 4000 4 5 6 177 Computer-Based Speech Therapy Using Visual Feedback All children had sufficiently well developed proprioception, imitation and exteroception skills to be able to benefit by speech training and support of technical devices. However, only six children finished the evaluation. Two children interrupted the trial because of various reasons. 9.5.4. Treatment It was decided to evaluate the OLP therapy method through an AB/BA design in order to control for carry-over effects of one treatment condition to another. ‘A’ represented traditional therapy where explanations of manner and place of articulation were done by sign language if needed and shown through tactile and visual feedback by the use of a mirror, articulation pictures, or the therapist as a model. B represented therapy with the OLP method. Three children with severe to profound hearing impairment received treatment A then B (OLP) and three children with moderate to severe hearing impairment received B (OLP) then A which is shown in Table 9.3. All children trained two times a week for about 20 minutes each time during two periods of five weeks each. Each child required treatment for specific sounds with specific therapy objectives. Table 9.3. AB/BA design of the clinical trial. Traditional versus computer assisted treatment with two groups of hearing-impaired children. Each square represents one week. GroupAB X A A A A A X B B B B B X GroupBA X B B B B B X A A A A A X Assessments = X Before therapy started and after each therapy phase all children were recorded when they read two lists of words consisting of treatment and non-treatment words. These recordings were assessed to investigate therapy effects. 9.5.5. Therapy objectives Before any treatment the target therapy objectives were selected for each child, based on the children’s need for articulation training and according to the therapists’ familiarity with each child’s speech 178 Clinical evaluation studies of the three systems deviations. The objectives were the same for both A/B treatments. All children required treatment for specific sounds. The range of different therapy aims was small because targets of for example changing pitch or decreasing volume were not important for these children to train because they all had a rather good control of these parameters. A list of treatment words containing the therapy objectives was constructed for each child. This was used in all therapy sessions. A list of non-treatment words with the same structure as the treatment words was also created for each child. These words were not used during therapy. All children were recorded when they read all words of the two lists before 1) any training, 2) after OLP training and 3) after traditional training to investigate therapy effects. Child 1 (PTA, better ear 55 dBHL) This girl was 9 years of age and had difficulties with /s/, /l/ and consonant clusters. She also omitted final consonants in words. It was decided that her treatment objectives with work with Optacia and Griphos should be syllables and words containing the following vowels and consonants: Visual maps (Optacia) • Syllables: LA, LI, LO, SA, SI, SO • Long Swedish vowels Automatic recognition with graphical interfaces (Grifos) • Syllables • Single words with /s/in initial, medial, final word-position and the cluster [st] • Single words with /l/ in initial, medial, final word-position and the cluster [bl] Child 2 (PTA, better ear 70 dBHL) This eleven-year-old girl’s most serious problem was the pronunciation of the Swedish sibilants and the orthographic letter (x) [ks]. Her treatment objectives with work with Optacia and Griphos were decided to be isolated sibilants and words containing following consonants: 179 Computer-Based Speech Therapy Using Visual Feedback Visual maps (Optacia) • Isolated sibilants: /s, C, Ó/ Automatic recognition with graphical interfaces (Grifos) • Single words with /s, C/ • Single words with /Ó/ in initial, medial, final word-position • Cognates • Single words with (x) [ks] Child 3 (PTA, better ear 77 dBHL) Child 3 was an eleven-year-old boy who had great difficulties with the /f/ and /v/ sounds. He always substituted /f/ with /C/ and /v/ with /d/. He was extremely interested in sports and his treatment words were decided to be sports words containing the following consonants: Visual maps (Optacia) • Isolated /s, C/ • Syllables: /f, C/ + long vowels • Syllables: /v, d/ + long vowels Automatic recognition with graphical interfaces (Grifos) • Single words with /f, v/ in initial, medial, final word-position Child 4 (PTA, better ear 81 dBHL) This boy was fourteen years old and his greatest pronunciation problems were the Swedish sibilants and the consonants /k/ and /N/. His objectives were set to: Visual maps (Optacia) • Isolated sibilants • Syllables: sibilants together with [u:] Automatic recognition with graphical interfaces (Grifos) • Single words with /s, C, Ó/ and /k, N/ in initial, medial, final word-position 180 Clinical evaluation studies of the three systems Child 5 (PTA, better ear 103 dBHL) This 10-year-old boy had disordered pronunciation of all sibilants and substituted /m/ with /b/. He also confused the production of the vowels /u:, ¨˘, o:/. His training material was as follows: Visual maps (Optacia) • Isolated sibilants and sibilants together with /u:, ¨˘, o:/ • Isolated vowels /u:, ¨˘, o:/ Automatic recognition with graphical interfaces (Grifos) • Single words with /s, C, Ó/in initial, medial, final position • Single words with /m/ and /b/ Child 6 (PTA, better ear 108 dBHL) The last child was an 8-year-old girl with similar difficulties. All sibilants and consonant-clusters were difficult for her to pronounce intelligible. Her objectives were: Visual maps (Optacia) • Isolated sibilants Automatic recognition with graphical interfaces (Grifos) • Single words with /s, C, Ó/in initial, medial, final position • Single words with consonant-combinations [ls], [sl], and [mp] 9.5.6. Assessments Three different assessments were performed to study the effects of treatment A and/or B, whether treatment A and/or B had a positive impact on untrained material and to obtain the therapists’ opinion of possibilities and limitations of the OLP therapy method. Evaluation of the effects of AB and BA therapy by using a listening test The evaluation compared the results of the speech training with the help of OLP therapy with traditional methods for therapy, by using a listening test. Ten naive adult listeners listened to the recorded speech samples of the treatment word list produced before and after each treatment phase see Table 9.3, and selected the best pronunciation out of three of the same word for every child. The 181 Computer-Based Speech Therapy Using Visual Feedback order of utterances for each word was played randomly so the listeners were not aware which pronunciation was recorded before or after any therapy. The listening-test was administered by the program “Judge” (Granqvist, 1998). The subjects compared the stimuli with one another and rated them by entering text. A screen of the program is shown in Figure 9.24. Figure 9.24. Screen of the program “Judge”. The subject has to decide which pronunciation out of three that sounds best? 1, 2, or 3? Evaluation of transfer to untrained material by using a listening test In order to assess whether effects of treatment A or B had been generalised to untrained material, the listeners also listened to and rated the recorded speech of the separate list of non-treatment words with the same structure as those used during treatment. The words were recorded in the same way before start and end of each treatment phase and the procedure of the listening test was the same as reported above. Questionnaire An evaluation was also carried out based on a questionnaire (see Table 9.4) that the two speech therapists answered after finishing the OLP therapy method containing the following questions: 182 Clinical evaluation studies of the three systems Table 9.4. Questions that were answered by the two therapists after OLP therapy. 1. Do you think that experience with computers is necessary to be able to use this system? 2. Do you think that phonetic knowledge is necessary to work with this system? 3. How did the system meet with your expectations? 4. Was the system easy to handle? 5. Did you consider the training as meaningful? 6. Did you like the graphics? 7. How was the system from a pedagogical point of view? 8. Was the system reliable? 9. How was the interaction with the children? 10. Did the children learn anything? 11. Did the children understand the feedback from the visual maps? 12. Did the children understand the feedback from the recognition? 13. Should any element have had more or less training opportunities? 14. Was the child motivated to train with the system? 15. Did you miss the possibilities to train something? If yes, what? 16. Additional information: 17. Try to estimate how much time you have spent for each session. 9.5.7. Results Evaluation of the effects of treated and non-treated words Figures 9.25-9.30 show the results of the selections made by the subjects of the six children’s best pronunciation out of three for both treated and non-treated words. Words written in capital letters indicate treated words and words written in small letters represent non-treated words. 183 Computer-Based Speech Therapy Using Visual Feedback Child 1 / BA before training after OLP after trad. training no difference 10 9 listeners 8 7 6 5 4 3 2 1 0 BOLL LÄTT NALLE BLOMMA läsa Treated words säl before training after OLP after trad. training no difference Child 2 / BA Treated words sk ju ta sa x kä pp TJ UT A TJ EJ TA XI TA X SK EP P GA SJ UN SJ UK 8 7 6 5 4 3 2 1 0 KJ OL listeners mössa Non-treated words 10 9 184 mus Non-treated words Clinical evaluation studies of the three systems rn in g ff va tu ffa trä so v ha la fu fin na ef ch A VI NN L OL TB FO FI NA L listeners Child 3 / BA 10 9 8 7 6 5 4 3 2 1 0 ffa before training after OLP after trad. training no difference Treated words Non-treated words Figure 9.25-9.27. Results of ten listeners’ ratings of the best pronunciation out of three of treated and non-treated words produced by three children with moderate hearing impairment from Group B(OLP) A. Words written in capital letters indicate treated words and words written in small letters represent non-treated words. before training after trad.training after OLP no difference Child 4 / AB listeners 10 9 8 7 6 5 4 3 2 1 0 KÄPP SOL Treated words TACK dusch sju springer tjej Non-treated words 185 Computer-Based Speech Therapy Using Visual Feedback before training after trad. training after OLP no difference Child 5 / AB 10 listeners 9 8 7 6 5 4 3 2 1 0 BUS KJOL MUS MÅNE SOL TACK Treated words dusch kött sax Non-treated words before training after trad. training after OLP no difference Child 6 / AB 10 listeners mun 9 8 7 6 5 4 3 2 1 0 HALS LAMPA NÄSA TACK Treated words TAXI dusch förstå kött sax sjuk Non-treated words Figure 9.28-9.30. Results of ten listener’s ratings of the best pronunciation out of three of treated and non-treated words produced by three children with profound hearing impairment from Group A B(OLP). Words written in capital letters indicate treated words and words written in small letters represent non-treated words. The results show that most of the children improved their articulation after training with the OLP therapy method as well as with the traditional method. This was highly characteristic for motivated children with some functional hearing. Not all objectives 186 Clinical evaluation studies of the three systems were reached but in all cases except for child 6 some of the targets were achieved. Children with a moderate to severe hearing loss gained more knowledge and became more skilled in pronouncing their target therapy objectives than the children who had a severe to profound hearing impairment. To some extent this might depend on the therapy order but most likely this was due to the fact that these children had better speech processing capabilities and were more motivated to speech therapy than the children of Group AB. All children except one showed transfer of speech skills to untrained words of the same structure as the treatment words after having finished the trial. The most noticeable impact of the OLP therapy method was that this type of training seemed to start a process of awareness, understanding and development of the children’s speech production if this type of therapy preceded traditional training, which was the case for the children of group BA. The listeners chose most of these children’s pronunciations after traditional training as the best ones even if the OLP therapy method had paved the way. Questionnaire The observations and opinions of the two therapists after finishing the OLP therapy could be outlined in possibilities and limitations of the OLP therapy method. The therapists were happy with the evaluation and the possibility to connect with research by experimenting with a modern speech training aid under development. They thought that this method gave possibilities of cooperation as some children wanted to participate in the management of the system. The child’s motivation to achieve correct articulation was enhanced with the system, and a correct pronunciation was easier to be established and kept in mind. They also thought that the OLP system could be excellent to be used by motivated hearingimpaired adults outside school to give them possibilities to train speech on their own. The most positive thing with the system, according to the therapists, was that it helped to get a child to repeat the same word/sound many, many times, which is impossible by traditional training. To sum up, the therapists agreed that the Optacia served the purpose for speech therapy with moderately, severely, and profoundly hearing-impaired children. The possibility 187 Computer-Based Speech Therapy Using Visual Feedback in OPTACIA to easily change the layout or to visualize articulatory movements rather than sounds, e.g. tongue retraction, nasality, lip rounding, etc, seems to be unique and promising. In the instruction phase OPTACIA offered navigational feedback in the instruction phase through the visual maps that was instructive and easy for the children to understand. The children became aware of how their articulation differed from correct behaviour and understood how correct realisation should be reached. With the help of the graphical interfaces of OLPy, intensive training on existing skills to establish automaticity was possible. It was obvious that the extensive training that the OLP therapy method offered also transferred skills to untrained words, as can be seen in Figures 9.25-9.30. However, the therapists also provided information about some weak points of the system and some important negative aspects after finishing the OLP therapy. Many of these drawbacks referred to the • • • • lack of user-friendliness the type of feedback that the system offered the use of automatic recognition the reliability of the system. They claimed that the OLP therapy method required too much time to learn to run on one’s own, necessitated very good knowledge of computer techniques, and contained a complicated procedure to reach an exercise. They maintained that the feedback from the graphics too seldom gave the child any encouragement, that the child felt unsure when there was no feedback after a good try, and that there were no possibilities of a contrastive visual training by visually comparing the child’s production with a correct model. It was also found that some moving parts of the graphics were too slow-moving. This caused some children to lose interest. Some graphics needed more obviously contrasting colours and some needed a more obvious starter as the children complained that it was not apparent to them when to start their production. However, there was a general opinion that the visual maps of OPTACIA gave the children navigational and meaningful visual feedback and the use of them during training was a contributing cause of the positive result of the OLP therapy method. However, the 188 Clinical evaluation studies of the three systems therapists stated that they would like predefined phonetic maps to be integrated in the system to make it more user-friendly. The therapists were skeptical about using automatic speech recognition with profoundly hearing-impaired children because there was a risk that the children were training against incorrect models due to the great variability and peculiarities in their speech. Some children were also too emotional, required a lot of support, were unmotivated, and were too immature to work with automatic speech recognition as it offered limited navigational feedback. The therapists also experienced that the recognizer was unreliable and gave inconsistent feedback. Discussion Three important questions were provided in order to evaluate the OLP therapy method for hearing-impaired children. The questions were: if the OLP method of therapy was effective in increasing the intelligibility of the speech of prelingually hearing-impaired children with moderate to severe hearing losses and with severe to profound hearing losses, if there was any transfer of speech skills to untrained speech material and if the OLP therapy method functioned in a real environment. All children were motivated and were happy to join the trial. They liked the graphics, considered the training as meaningful and came punctually in a good mood to the speech clinic. The result showed that the system was most useful for the children who had some residual hearing. All children of Group BA improved their production and became more aware of how to articulate their objectives more intelligibly after the OLP treatment. The OLP treatment also had a positive effect on the traditional treatment. By starting with OLP therapy it seemed as if a process of awareness and understanding was commenced. The navigational feedback of the visual maps was easy to understand and gave the children good instructions how to move their articulators in a correct manner. Despite the small number of children tested, the therapist thought that OPTACIA had potentials to be a useful, uncomplicated and quick method if predefined visual maps were available. To sum up, it could be said that OPTACIA as a whole tended to function in a real environment but the graphical interfaces 189 Computer-Based Speech Therapy Using Visual Feedback together with word recognition was too complicated to use with severely and profoundly hearing-impaired children as it was unstable, unreliable and very time consuming to use in an exercise. 9.6. Conclusions When introducing new technology as well as new pedagogical and phonetic methods for speech therapy it is extremely important to investigate the effects, efficiency, efforts, and benefits involved. However, this can be especially difficult to carry out especially with profoundly hearing-impaired children, who constitute a very heterogeneous group. Therefore, mostly case studies are performed which follow a child for a longer period of time. Also difficult to perform are evaluations that comprise many schoolchildren, who should have the same type of therapy for the same period of time, due to holidays, absence from school, and the necessity for the children to leave the classroom and miss important classes. For that reason statistical methods are seldom used in these kinds of assessments. 190 General Conclusions and Recommendations 10. General Conclusions and Recommendations 10.1. Recommendations The expected effect of speech therapy in general is: • • • • to establish automaticity to expand the speaker’s phonological system to make the learner’s best production be his/her most common production to transfer skills to untrained situations and linguistic use Evaluations and experiences from visual computer-based speech therapy have shown that this type of therapy is a valuable and effective expansion in the speech clinic as it assists the therapist in the expected effects described above as well as in many problematic aspects that are included in a speech therapy program. Computerbased speech training with visual feedback stimulates, and motivates the children through amusing and variable visual feedback and seems to start a process of awareness, understanding and development of their speech production. However, it is important to point out that even the best computer program could never replace the therapist but only assist and facilitate his or her work. Computeraided speech training is a supplement to traditional methods and has a pedagogical value for the therapist who has a good knowledge of articulatory and acoustic phonetics as well as of the computer technique. Results have shown that the visual feedback provided by these systems helps severely and profoundly hearing-impaired children and L2 speakers to understand what is wrong and what is correct in their production. It offers meaningful feedback of distinctive contrasts that are not visible via speech-reading and consequently difficult to learn to produce correctly. Especially appreciated is the very objective evaluation of a child's speech that is provided by a visual computer-based system. The speech therapist often has a difficult role of encouraging and motivating the child at 191 Computer-Based Speech Therapy Using Visual Feedback the same time as she/he must criticise and evaluate the child's attempts. Visual speech training systems give the therapist and the child better possibilities to cooperate. Moreover, game-like strategies to strengthen ability in refined articulation offer a child extensive training on existing skills and help the therapist in the repetitive and additional training phase. Phonemes produced by the children are matched against models by comparing target spectra. To avoid the problem of interspeaker variability, the children’s own “best production” is stored as a target. Another general advantage that has been found is that it gives the speech therapist increased flexibility in training since the task levels and performance criteria offer many choices. Much of traditional training is combined in one piece of equipment, built to display several speech parameters or features, which makes the selection of the most suitable training easy. Furthermore, efforts and improvements in training can easily be registered and documented since the information on the screen can be printed out or saved in files. One explanation of the positive training results in the clinical evaluations of the three described therapy systems is the very instructive and pedagogical information that these systems provide in the form of a meaningful, motivational, easily comprehensible, objective visual feedback that is shown without delay. It gets the children to understand what is wrong and what is correct in their production, especially by means of a comparable navigational visual feedback, that simultaneously shows the correct model of the teacher and the deviant production of the learner on a split screen. It has been shown to be easy for the therapist to show and instruct the child about “new” speech sounds and the distinctive acoustic and articulatory cues that distinguish Swedish phonemes from each other. Training with minimal word pairs, which only differ with respect to one opposition, has proved to be an appropriate and efficient training material, especially in the instruction phase. Assessing the speech of profoundly hearing-impaired children phonologically prior to therapy has shown to be extremely important since valuable pedagogical information about systematical phonological deviations of the existing articulation skills due to visible interference can be derived. By this means, further 192 General Conclusions and Recommendations development of deviant processes can be avoided during the speech acquisition of profoundly hearing-impaired children. A phonological analysis is also of importance when assessing the representation of L2 learners’ existing articulation skills in spoken language. There are some evident risks of computer-based speech training that are worth consideration. To some extent new training techniques need to be developed. There is also always a risk that therapy will be adjusted to the system and not to the child's needs. It should also be pointed out that a computer-based therapy system must be user-friendly in order to be used by a therapist. If a program is too time-consuming to set up, too hard to survey or too demanding, this might act as a deterrent to using it, no matter how well developed the rest of the system might be. Besides, detailed manuals should be elaborated for all programs. A long-standing wish is that training courses in computerbased speech training and training in performing phonological analyses should be an integrated part of the linguistic schooling of speech therapists. This has also reference to the teachers of second language learning when assessing the representation of L2 learners’ existing articulation skills in spoken language. 10.1.1. Important demands on a visual computerbased speech therapy system Results and experiences from computer-based visual speech therapy for profoundly hearing-impaired children of various ages have shown that in order to be efficient and enhance the possibility for a child to develop intelligible speech, a visual speech training aid has a number of important requirements as follows: • Clear instructions and pedagogical manuals must be created and made available for use with different groups of children and clients. • The visual feedback of the child’s voice and articulation should be shown immediately and without delay. • The system must be acceptable to the therapist as well as to the child, which means that the system must be attractive, 193 Computer-Based Speech Therapy Using Visual Feedback interesting, easily comprehensible, easy to handle, and motivating. • The visual pattern must be natural, logical, and easily understandable. This means that the training parameters as, e.g., pitch should be displayed vertically as pitch variations occur, intensity should be shown through the size of an object that becomes larger as a sound becomes louder and smaller as the sound becomes softer, intonation and stress through a continuous red curve, duration could be shown horizontally and voicing through a relationship between voicing and the change of a colour. • The system should provide a contrastive training, that is, the correct model of the therapist and the deviant production of the child are shown simultaneously and compared with each other. • The system should provide a flexible, individual, and structural speech and voice training and give an objective evaluation of the child’s training results. 10.1.2. Efficiency of visual feedback of prosodic parameters within spoken L2 training L2 speakers' ability to discriminate between and to produce Swedish speech sounds, stress patterns, intonation, word accent and rhythm improved with computer-based speech training with audiovisual feedback. Especially prosodic features improved due to the comparable navigational feedback that made them understand in what way their speech differed. Some of the Swedish phonetic contrasts were more easily learned than others. A general opinion of the L2 speakers was that they became more aware of the Swedish pronunciation and spoke more Swedish after training. However, most of them wanted the training period to be extended for better long-term results. The possibility of home training was also a general desire. This work shows that it would be meaningful and advisable to use computer-based speech training with audiovisual feedback for training of both perception and production of Swedish prosody within spoken L2 training (CAPT). 194 General Conclusions and Recommendations 10.2. Comparison of the three systems Figure 10.1 summarises a comparison of the three product-oriented computer-based speech therapy systems with visual and optional auditory feedback that are reviewed in this thesis. All systems are product-oriented, but the sounds produced by the speaker are in different ways visualized on the computer screens after processing. All systems have optional auditory feedback, game-like exercises, and a user management. Dissimilarities and the most distinguishing qualities of each system are shown in the figure. SpeechViewer is a very powerful system for real time speech therapy. The thirteen different programs can be used with both children and adults, and normally hearing persons as well as hearing-impaired persons. Some of the programs, aimed at therapy with small children, are very easy to use by the therapist while others are technically very sophisticated and somewhat complicated to adjust to the client. There is no integrated vocabulary for training or structural training design in the system, a fact that requires good knowledge of phonetics and traditional clinical therapy to use the system in the most effective way. Some resourceful programs, such as “Pitch and Loudness” and “Phonology”, are extremely useful also for L2 speakers but they are dependent on the therapist’s proficiency to make the best use of them. Box of Tricks, on the other hand, is a much more user-friendly system and is useful for speech and hearing-impaired children below 12 years of age. Prerecorded vocabularies, comparable targets of a 10-year-old reference speaker, symbolic speech pictures for each phoneme, and illustrated words for children who have not yet learned to read make the system easy to handle but exclude the possibility to set up special individual training exercises. The OLP therapy method for distance learning, on the other hand, contains a library of training words to choose from and a structural training method but no reference speaker. The system uses ASR with best productions that makes the system complicated and time-consuming to handle. The evaluation showed that this method was not suitable to use with hearing-impaired children (see section 10.3.3) but is more suitable to use with adults suffering from functional articulation deviations. 195 Computer-Based Speech Therapy Using Visual Feedback Figure 10.1. Comparison of three computer-based speech therapy systems with visual feedback 196 General Conclusions and Recommendations However, the integrated program OPTACIA, which works with phonetic visual maps, was shown to be cost-effective and helpful to use with hearing-impaired children. Suggestions are given in section 10.4.1 for the most efficient use of the different types of visual feedback found in the three systems during different phases of speech therapy. 10.3. Recommended therapy design 10.3.1. General design of computer-based speech therapy with visual feedback The following recommendation of a general design of computerbased speech therapy with visual and optional auditory feedback, shown in Figure 10.2, is a result from experiences of pedagogical aspects, phonetic and phonological methods, and speech technology concerning the three existing systems described in this thesis. First the individual deviations that have the most impact on the intelligibility must be diagnosed. After that speech material must be introduced, for instance minimal word pairs, that gives good visual instruction to make the child aware of his/her deviant production, in what way it deviates, and how to produce it correctly. The next phase is a training phase aimed at changing a deviant realization, strengthening a successful production and establishing a more correct and intelligible pronunciation. To transfer skills to untrained material and linguistic use, a phase of repetitive and additional training is necessary. The target production must be repeated and practised in a variety of contexts frequently recurring. Guidance, reinforcement and assessment should be made at every stage of this learning process through specially designed visual and optional auditory feedback. Figure 10.2 displays a diagram illustrating these principles. A training session should not exceed 30 minutes to be efficient. A complete training-period should contain about 10 sessions before the final evaluation. The evaluation should be done by the therapist, by for example comparing a special recorded text material that the child reads before and after the training period. 197 Computer-Based Speech Therapy Using Visual Feedback When it comes to small children who cannot yet read, the evaluation must be based on an illustrative material. Figure 10.2. Components and information flow in computer-based speech training with audio-visual feedback 10.3.2. Structural training design A structural therapy and assessment procedure to be used with children who have some verbal skills was developed by Öster (1996) to help the therapists in their work for the following two reasons. One reason was that SpeechViewer is a tool for speech training that does not contain any speech-training material and is not structured in different levels of therapy but is dependent on the therapist’s power of invention. It consists of different programs for different types of training. Because of this the therapists working with the program must have good knowledge of both phonetics and computer techniques to utilise the system in the best way. The other reason was that therapy and assessment are inseparable procedures, as assessment is required regularly during training. To assist the therapists both in the assessment procedure as well as in the structural training, the procedure was elaborated in the form of a protocol. Training and assessment should be done on six levels, following steps of traditional speech therapy, starting with motor training on isolated speech sounds for training of basic skills like 198 General Conclusions and Recommendations respiration, intensity, pitch, phonation and articulation. Then follow syllables where the consonant is chosen on the basis of visibility, repeated syllables with different stress patterns, repeated alternated syllables for automatization of prosody, words containing the speech sounds of current interest for relevant training, and finally a short phrase containing the topical words. All the time the child’s mastery of breath control, intensity, pitch, duration, and voice quality on all levels should be listed in the protocol by the therapist, according to statistics provided by the system and according to the therapist’s opinion. This structural training design in six levels is also added in Box of Tricks as well as in the OLP therapy method, see list below. Characteristic of such training is that it is the most effective one as it is based on existing skills and expands the existing articulation skills. Respiration Intensity Pitch Phonation Articulation Vowels Isolation Syllables Repeated syllables with different stress patterns Words (one-two-polysyllables) Minimal word-pairs Short phrases Consonants Isolation Syllables and Sound sequences (initial, medial, final) Repeated syllables Words (one-two-polysyllables) Minimal word-pairs Short phrases Training exercises are designed on the different levels so that training on the segmental level will be expanded to the word level with the same phonemes. Training on the word level is then 199 Computer-Based Speech Therapy Using Visual Feedback expanded to the phrase and sentence level resulting in an expansion of the training of the same phonemes and words. 10.4. Visual feedback strategies Experiences from these three systems have shown that in order to make the therapy as effective as possible, it is important that certain types of visual feedback should be connected to specific phases in the training process of different user groups. 10.4.1. Type of visual feedback for severely and profoundly hearing-impaired children An applicable strategy for severely and profoundly hearing-impaired children is recommended in Figure 10.3. Children with or without hearing impairments with other types of articulatory impairments will also benefit from computer-based speech therapy with this type of well-developed visual feedback strategies. During the instruction phase, animated graphics and a comparable navigational feedback is appropriate to use in order to make the child aware of his/her deviant production, in what way it deviates, and how to produce it correctly. A criterion-referenced navigational feedback, based on visual targets of the child’s production that are compared to the norm and to their own productions, is also proper to be used in the instruction phase as well as in the training phase to let the child see the improvements. A navigational feedback that is dependant on correct control of the voice, pitch or intonation is informative in the training phase, as well as an evaluative feedback that immediately rewards the child if the production is acceptable and more correct than before. The reward is aimed to be encouraging as it reinforces the slightest improvement. Training based on a phoneme-based comparison of a stored model of the child’s so-far best production can stimulate the child in various amusing game-like layouts. As the child’s production is improving, new models must of course be stored as targets in a simple and quick way. 200 General Conclusions and Recommendations Figure 10.3. Therapy design and visual feedback strategies for severely and profoundly hearing-impaired children. During the last phase, where lots of repetitive and additional training is needed to automate correct production for transferring skill to 201 Computer-Based Speech Therapy Using Visual Feedback linguistic use, the evaluative feedback should only reward the child when the production is correct and hitting the mark. Lots of phoneme-based recognition exercises that are amusing, motivating, and challenging are needed to intensify the training. 10.4.2. Type of audio-visual feedback for L2 learners Audio-visual feedback of prosodic parameters such as intonation contours, stress patterns and fluency has been shown to be efficient for L2 learners and supports a wider usage within CAPT. Figure 10.4 illustrates relevant parts in this approach. The training phases are identical as can be seen in the figure. Fig. 10.4 Feedback strategies for L2 learners. 202 General Conclusions and Recommendations In all phases (instruction, training and repetitive and additional training for transfer) a comparable navigational visual feedback accompanied by an auditory feedback for perceptional training is recommended. The auditory and visual similarity of the teacher’s model and the client’s production is enough to stimulate the client to continue the training in a meaningful way. There is substantial research concerning the use of automatic word recognition in CAPT as it automatically detects segmental errors fairly well. However, there are still problems with identifying prosodic deviations, a fact that supports a wider usage of this type of comparable navigational audio-visual feedback of prosodic parameters in CAPT. 10.4.3. Use of automatic speech recognition and spectral comparison of phonemes with profoundly hearing-impaired children The use of automatic word recognition and spectral comparison of phonemes has been evaluated with profoundly hearing-impaired children in this study. It was a general opinion that the feedback from the automatic word recognition in the OLP method was improper for use with profoundly hearing-impaired children for three reasons. In the first place, the children felt unsure when there was no encouraging rewarding feedback after a good try. Moreover there was no navigational feedback during training as well as no possibility of a contrastive training that visually compares the child’s effort with a correct model. These three types of visual feedback have been shown to be the most central ones to stimulate and inform severely and profoundly hearing-impaired children in their speech acquisition. The normal goal of ASR is to classify all utterances correctly, even if they are not pronounced accurately. However, in pronunciation teaching with severely and profoundly hearingimpaired children the system must be able to do more than distinguish between good and poor pronunciations. Intelligibility is in focus and the quality of the child's speech determines whether most people can understand them easily or with difficulty. Therefore such a system must provide an acceptable evaluative feedback that 203 Computer-Based Speech Therapy Using Visual Feedback provides a measure of goodness and shows that the child is doing very well and has made some improvements. Thus the two systems, ASR and pronunciation teaching systems have different aims. The other reason is that severely disordered speech produced by profoundly hearing-impaired children often contains temporal, phonatory and aerodynamic deviations that correlate poorly with ASR accuracy. Instead the segment-based speech comparison used in SpeechViewer and Box of Tricks that compares a recorded phoneme inside a word with a stored production behaved extremely well. This particular method is known as speaker-dependent, in which a particular utterance is compared to a stored target of a child’s best production. Phoneme-based speech recognition has shown to be useful in programs for repetitive and additional training that aim at strengthening a successful production and get the child’s best production to be his/her most common production in all possible surroundings. 10.5 Conclusions In this thesis, work carried out over many years is presented dealing with severely and profoundly hearing-impaired children’s possibilities to acquire speech skills through computer-based speech therapy with visual feedback. This technique is also tested and evaluated in the pronunciation training with adult second language learners. Studies that investigate the effects of speech input limitations on speech production, the interaction of individual deviations on speech intelligibility, phonetic realizations of phonological systems and the use of visual feedback in computerbased speech therapy reflect the problems and possibilities that must be considered when teaching severely and profoundly hearingimpaired children speech. Hypotheses are presented that a specialized individual diagnostical method must be carried out before therapy to base the therapy on. Different visual feedback strategies and therapy design for each of these two client groups: children with severe and profound hearing losses and adult second language learners, should be utilized to get the clients to benefit from computerized speech training. It is well documented that the speech intelligibility of severely and profoundly hearing-impaired children varies a great deal and 204 General Conclusions and Recommendations cannot be predicted from the degree of hearing loss, as measured by pure-tone audiometry. The sort of speech a profoundly hearingimpaired child might develop depends more on the functional hearing for speech, that is, residual hearing for speech, amount of hearing aid use, amount of auditory training, discrimination ability of speech features, learned ability to identify speech sounds, phonological short-term memory and speech processing capabilities. A method for testing the functional hearing for speech through an analytical speech perception test that gives supplementary information to the pure tone audiogram is described and tested. An important part of the thesis discusses different relevant aspects of the speech of severely and profoundly hearing-impaired children, as for instance age of onset of hearing loss and dependence on visibility of speech sounds, and reviews determining factors and prerequisites for severely and profoundly hearing-impaired children to benefit from speech therapy and develop intelligible speech. The objective is to give necessary basic details of why a special visual feedback strategy and therapy design must be used during speech therapy for severely and profoundly hearing-impaired children, in contrast to what should be the most useful in spoken language training with adult second language learners. The hypothesis that the use of audio-visual comparable feedback helps L2 learners in perceiving and producing especially prosodic parameters is tested in the thesis. The clinical evaluation shows promising results that indicate a wider usage of audio-visual feedback in CAPT systems. The hypothesis that involves a diagnosis method in the form of an individual phonological assessment based on a detailed phonetic analysis, states that this kind of analysis gives information about how the articulation is realized linguistically, and reveals deviantly realized phonological contrasts. Part of the work deals with the development of a method for such a phonological assessment which investigates what profoundly hearing-impaired children’s speech does express and how speech sounds are realized by L2 learners. The method is tested and is shown to assist the therapist in deriving significant and individual information about the clients’ speech productions as it gives possibilities to outline a constructive and individualized speech therapy with well-designed visual feedback. The phonological assessment provides information 205 Computer-Based Speech Therapy Using Visual Feedback about deviations in different linguistic contexts of those speech sounds that the clients know the articulation of and the way these differ from the normal model. In addition, the method also provides information about the speech-sounds that they cannot yet produce. A phonological assessment of a profoundly hearing-impaired child’s speech also shows whether a deviant pronunciation in fact is a realization of signaling a contrast of meaning and if so in what way the phonetic element used differs from the normal model. Three Swedish computer-based speech therapy systems with visual feedback are presented in the thesis. The systems are assessed with reference to functionality, speech visualisations, and types of visual feedback, therapy design and practical usability. Fundamental principles of the three systems and their different solutions for working with reference speech, target spectra, contrastive training, objective evaluation, understandable visualisations and clinical management are discussed. Two of the systems are available as commercial products: the IBM SpeechViewer III and Box of Tricks, called Trollerilådan in Sweden. The original SpeechViewer program was developed 15 years ago and is available in twenty different languages. Box of Tricks is a result of the completed SPECO-Project, funded by the EU INCO-COPERNICUS program and is developed in four languages: English (Box of Tricks), Hungarian (Varázsdoboz), Slovenian (ARTI) and Swedish (Trollerilådan). The third system, OLP-therapy, is a result of the recently finished EU Quality of Life and Management of Living Resources project, with participation from Greece, France, Spain, Sweden, and the UK. All the systems are developed for children with speech and/or hearing impairments who have difficulties in understanding and producing spoken language. They all use visual information as an alternative feedback but present it in different ways. All systems use different kinds of visualisations and speech pictures of important speech parameters, correct from the acoustic-phonetic point of view, designed to be understandable and interesting for young children. The SpeechViewer and Box of Tricks are designed to supplement speech therapy while the most innovative aspect of the OLP-therapy system is a distance learning application entirely built on distributed (Internet) technology. 206 General Conclusions and Recommendations Several clinical evaluation studies with both severely and profoundly hearing-impaired children as well as with adult second language learners investigate the effectiveness of the systems, the degree of transfer and generalization, and the overall functionalities. The findings from the evaluations showed positive results in many ways, as for instance assisting the therapists to instruct and explain what is wrong and what is correct articulation through comparable visual feedback, in helping the client with significant amounts of additional training, in making the clients aware of nonvisible manner and place of articulation through an immediate and meaningful visual feedback and to get a more positive cooperation in the speech clinic. The specified recommendations in the end of the thesis about how to utilize different types of visual feedback during different phases of therapy as well as the fact that a structured therapy design should be adjusted for different client groups are important to take into consideration for future developments of computer-based speech therapy systems with visual feedback. 207 References 11. References Abberton, E., Fourcin, A., Hazan, V. (1985). Phonological competence with profound hearing loss, Paper presented at the Int. Congress on Education of the Deaf, Manchester. Adolvsson, K., Forsén, H. (1968). Samband mellan talförståelse och talkvalitet, Examensarbete vid Lärarhögskolan i Stockholm (Manillaseminariet). Handledare Arne Risberg, KTH. Amcoff, S. (1973). Relationer mellan språkliga uttrycksformer. En undersökning av elever i specialskolan för hörselskadade. Pedagogisk forskning, Uppsala, No.1. Arends, N., Povel, D. J., Michielsen, S., Claassen, J., Feiter, I. (1991). An Evaluation of the Visual Speech Apparatus (VSA), Speech Communication, 10, 405-414. Arends, N. (1993). The visual speech apparatus, IvD/RES/9303/01, Instituut voor Doven, Sint Michielgestel, The Netherlands. Arlinger, S., & Hagerman, B. (1997). The Swedish approach to speech audiometry. Speech Audiometry, 2:a uppl. (red. M. Martin): Whurr Publishers, London. Bamford, J., Saunders, E. (1991). Hearing Impairment, Auditory perception and Language Disability, Studies in Disorders of Communication, Whurr Publ., London Jersey City, ISBN 1870332-01-6. Bannert, R. (1990). På väg mot svenskt uttal. Lund: Studentlitteratur. von Békésy, G. V. (1959). Similarities between hearing and skin sensations, Psychol. Rev., 66, 1-22. Bench, R. J. (1992). Communication skills in hearing-impaired children, Whurr Publishers Ltd, ISBN 1-56593-075-4, London N1 2UN, England. Bernstein, J. (1977). Intelligibility and simulated deaf-like segmental and timing errors. Record IEEE Int. Conf. Acoust. Speech and Signal Processing, Hartford. Beskow, J., Granström, B., House, D., Lundeberg, M. (2000). Experiments with verbal and visual conversational signals for an automatic language tutor. Proc of InSTIL 2000, Dundee. 209 Computer-Based Speech Therapy Using Visual Feedback Beskow, J. (2003). Talking Heads. Models and Applications for Multimodal Speech Synthesis, Ph.D Thesis, KTH, Sweden. ISBN 91-7283-536-2. Beskow, J., Karlsson, I., Kewley, J. and Salvi, G. (2004). SYNFACE - A Talking Head Telephone for the Hearing-impaired. In K Miesenberger, J Klaus, W Zagler, D Burger eds, Computers helping people with special needs, 1178-1186. Binnie, C.A., Daniloff, R.G. and Buckingham, H.W. (1982). Phonetic disintegration in a five-year-old following sudden hearing loss, J. Speech and Hearing Dsorders, Vol 47, p 181-189. Boothroyd, A. (1970). Concept and control of fundamental voice frequency in the deaf – an experiment using a visible pitch display, Paper presented at the International Congress of Education of the Deaf, Stockholm, Sweden. Boothroyd, A. (1984). Auditory perception of speech contrasts by subjects with sensorineural hearing loss. Journal of Speech and Hearing Research, 27, 128-134. Boothroyd, A. (1995). Speech perception tests and hearing-impaired children, in Profound Deafness and Speech Communication, G. Plant and K-E Spens, London, Whurr Publisher Ltd, pp.345-371. Borg, E., Risberg, A., McAllister, B., Undemar, B. M., Edquist, G., Reinholdson, A-C., Wiking-Jonsson, A., Willstedt-Svensson, U. (2002). Language development in hearing-impaired children. Establishment of a reference material for a 'Language test for hearing-impaired children', LATHIC. Int. J. Pediatr. Otorhinolaryngol. 1;65(1):15-26. Bush, C.N., Edwards, M.L., Luckau, J.M., Stoel, C.M., Macken, M.A., Petersen, J.D. (1973). On specifying a system for transcribing consonants in child language: A working paper with examples from American English and Mexican Spanish. Report, Dept. of Linguistics, Stanford University. Bälter, O., Engwal,l O., Öster, A-M., Kjellström, H. (2005). Wizard-ofOz Test of ARTUR - a Computer-Based Speech Training System with Articulation Correction [pdf]. In Proceedings of the Seventh International ACM SIGACCESS Conference on Computers and Accessibility, pp. 36-43, October 9-12, 2005, Baltimore, MD. 210 References Calvert, D. (1961). Some Acoustic Characteristics of the Speech of Profoundly Deaf Individuals. Ph.D. thesis, Stanford University, Palo Alto, CA. Calvert, D.R. and Silverman, S.R. (1975). Speech and Deafness. Washington: Alexander Graham Bell Association for the Deaf. Carlson, R., Granström, B., Hunnicutt, S. (1982). A multi-language text-to-speech module, Proc. ICAASP 1982, 3, 1604-1607.. Cohen, M. L. (1968). The ADL Sustained Phoneme Analyzer, American Annals of the Deaf, 113, 168-177. Cowie, R. and Douglas Cowie, E. (1992). Postlingual acquired deafness: Speech deterioration and the wider consequences, Berlin, ISBN 3110125757, Publisher Mouton de Gruyter. Cramer, K.D., and Erber, N.P. (1974). A spondee recognition test for young hearing-impaired children. Journal of Speech and Hearing Disorder, 39, 304-311. Crepy, H., Denoix, B., Destombes, F., Rouquie, G., Tubach, J-P. (1983). Speech Processing on a Personal Computer to Help Deaf Children, World Computer Congress, 669-671. Crepy, H., Destombes, F., El Breze, M., Rouquie, G. (1986). Speech trainingon a personal computer, Int. Congress on Acoustics, Toronto. Dalby, J., Kewley-Port, D. (1999). Explicit pronunciation training using automatic recognition technology, Calico Journal, 16, 3, 425445. Dodd, B. (1974). The acquisition of phonologicalal skills in normal, severly subnormal and deaf children. Doctoral Dissertation, Univ. of London. Dodd, B. (1976). The phonological systems of deaf children. JSHD, 41, 2, 185-197. Dodd, B. (1988). Lip-Reading, Phonological Coding and Deafness. In Dodd and Campell, eds., Hearing By Eye: The Psychology of LipReading, 177-189. Ellegård, A. (1982). Språket och hjärnan, Hammarström & Åberg, ISBN 91-7638-031-9. Engwall, O., Wik, P., Beskow, J., Granström, G. (2004). Design strategies for a virtual language tutor [pdf], In Proc of ICSLP 211 Computer-Based Speech Therapy Using Visual Feedback 2004, vol. III: 1693-1696. Jeju Island, Korea, 4-8 October. Editors: Soon Hyob Kim and Dae Hee Youn. Engwall, O., Bälter, O., Öster A-M., Kjellström, H. (2006). Designing the user interface of the computer-based speech training system ARTUR based on early user tests, to appear in Journal of Behavioural and Information Technology. Erber, N.P (1974a). Pure-tone thresholds and word-recognition abilities of hearing-impaired children. Journal of Speech and Hearing Research, 17, 194-202. Erber, N.P (1974b). Visual perception of speech by deaf children: recent developments and continuing needs. Journal of Speech and Hearing Disorder, 39:2, 178-185. Erber, N.P (1977). Speech perception by profoundly hearingimpaired children, Research Conference on Speech-Processing Aids for the profoundly hearing-impaired, Gallaudet Collage, Washington, 23-26 May. Eriksson, E., Bälter, O., Engwall, O., Öster, A-M., Kjellström, H. (2005). Design Recommendations for a Computer-Based Speech Training System Based on End-User Interviews [pdf] . In Proceedings of the Tenth International Conference on Speech and Computers, pp. 483-486, October 17-19, 2005, Patras, Greece. Eskenazi, M. (1996). Detection of foreign speaker’s pronunciation errors for second language training-preliminary results, Proceedings of the international conference on spoken language processing, ’96. Eskenazi, M. (1999). Using a Computer in Foreign Language Pronunciation Training: What Advantages?, Calico Journal, 16, 3, 447-469. Ewertsen, H. W. (1973). Auditive, visual & audio-visual perception of speech (Operation Helen. First preliminary report), The State Hearing Centre, Bispebjerg Hospital, Copenhagen. Ewing, I. R. (1941). Lipreading for adults, Teacher of the Deaf, 39, 3-6. Fairbanks, G. (1960). Voice and articulation drillbook, 2nd edn. New York: Harper & Row. pp124-139. Fant, G. (1959). Acoustic analysis and synthesis of speech with applications to Swedish, Ericsson Technics No. 1, 1-108. 212 References Fant, G. (2001). On the speech code, TMH-QPSR, KTH; 61-67, Vol 42. Finnerty, J. (1996). Analyzing the Development of Early Childhood Language, Educational Software Research Inc, Lexington Mass. Fisher, C.G. (1968). Confusions among visually perceived consonants. Journal of Speech and Hearing Research, 11, 796-804. Fischer, S. (1995). Critical periods for language acquisition: Consequences for deaf education, 18th international Congress on education of the deaf, Tel-Aviv, Israel, July 16-20. Flege, JE. (1989). Using visual information to train foreign language vowel production, Language Learning 38, 365-407. Flege, JE. (1998). Second-language learning: The role of subject and phonetic variables. Proc Speech Technology in Language Learning (STiLL 98), Marholmen, Sweden. Fletcher, S. G. (1986). Visual feedback and lip-positioning skills of children with and without impaired hearing, JSHR, 29, 231-239. Fletcher, S. G., Dagenais, P. A., Critz-Crosby, P. (1991). Teaching Consonants to Profoundly Hearing-Ompared Speakers Using Palatometry. Journal of Speech and Hearing Research, 34, 929-942. Fourcin, AJ. & Abberton, ERM. (1971). First application of a new laryngograf. Med Biol Rev, 21: 172-182. Furth, H. (1964). Research with the deaf: implications for language and cognition. Psychological Bulletin 62, 145-164. Gold, T. (1980). Speech Production in Hearing-Impaired Children, J. Comm.Dis. 13, 397-418. Göllesz. (1972). ref. by Mills, A. 1983, see below. Granqvist, S. (1998). Spruce signal workstation add-on package, Stockholm, Sweden. See information on web page of Hitech Development AB at http://www.hitech.se/development/, april 2006. Granström, B. & Öster, A-M. (1994a). Speech synthesis for hearingimpaired persons - in research, training and communication, Proceedings from 2nd Int. Symposium on Speech and Hearing Sciences, Sept. 24-25 1994, Osaka, Japan, 49-65. Granström, B. & Öster, A-M. (1994b). Speech synthesis for hearingimpaired persons - in research, training and communication, STL/QPSR 2-3/94, 93-111. 213 Computer-Based Speech Therapy Using Visual Feedback Grewel, F. (1963). Remarks upon the acquisition of language in deaf children, Language and Speech, 1963, 6, part 1, 37-45. Grunwell, P. (1985). Phonological assessment of child speech (PACS). NFER-Nelson, Winsdor, UK/College-Hill Press, San Diego, CA. Grunwell, P. (1987). Clinical Phonology, Williams and Wilkins, Baltimore. Gustafsson, A. (1984). Svenskspråkiga färdigheter hos specialskolelever, en litterturöversikt och empirisk studie, Högskolan i Örebro, Institutionen för psykologi och pedagogik. Gårding, E. & Bannert, R. (1979). Optimering av svensk uttalsundervisning, Praktisk Lingvistik 1, 1-9. Lund: Department of Linguistics, Lund University. Hatzis, A. (1999). Optical Logo-Therapy (OLT), Computer-based audio-visual feedback using interactive visual displays for speech training. Dept. of Computer Science, University of Sheffield, UK. Hatzis, A., Green, P. D., Howard, D. (1999). Optical logo-therapy (OLT): Visual displays in practical auditory phonetics teaching, Phonetics teaching and learning conference (PTLC '99), April 1999. Hatzis, A. & Green, P. D. (2001). A two dimensional kinematic mapping between speech and acoustics and vocal tract configurations. Workshop on Innovation in Speech Processing, (WISP’01), Stratford-upon-Avon, UK. Hincks, R. (2005). Computer Support for Learners of Spoken English, Dissertation, KTH, Stockholm. Hochberg, I., Levitt, H., and Osberger, M.J. (1983). Speech of the Hearing-Impaired, Research, Training and Personal Preparation. Baltimore, MD: University Park Press. Hodson, B.W. (1980). The Assessment of Phonological Processes. Danville, IL.: Interstate Inc. Holmber, E., Sahlén, B. (2000). Nya Nelli, Pedagogisk Design, Malmö. Hudgins, C. and Numbers, F. (1942). An Investigation of the Intelligibility of the Speech of the Deaf. Genet. Psychol. Monogr. 25, pp. 289-392. 214 References Huggins, A.W.F. (1977). Timing and speech intelligibility, in Attention and Performance, VII (ed. J. Requin). Ingram, D. (1989). Phonological Disability in Children. (2nd ed.), London: Sole and Whurr. Jakobson, R. (1968). Child Language, Aphasia and Phonological Universals. The Hague: Mouton and Co. Jamieson, DG. (1995). Techniques for training difficult non-native speech contrasts. Proc XIIIth Intl Congress Phonetic Sciences, Stockholm. John, J.E.J. and Howarth, J. (1965). The Effect of Time Distortions on the Intelligibility of Deaf Children's Speech. Lang. and Speech, 8, 127-134. Javkin, H. (1994). A new Speech Training System: Acoustic/Articulatory Data, Video Games and Synthesized Model Parameters. 2nd Int. Symposium on Speech and Hearing Sciences, 81-92. Osaka, Japan. Kent, R. D., Osberger, M. J., Netsell, R., Goldsmith Hustedde, C. (1987). Phonetic Development in Identical Twins Differing in Auditory Function, JSHD, Vol.52, 64-75. Kewley-Port, D., Watson, C. S., Elbert, M., Maki, D. and Reeds, D. (1991). The Indiana Speech Training Aid (ISTRA) II: Training curriculum and selected case studies, Clinical Linguistics & Phonetics, 1991, Vol. 5, No. 1, 13-38. Kewley-Port, D. & Watson, C. S. (1995). Computer Assisted Speech Training: Practical Considerations, Applied Speech Technology, Chapter 21, 565-582,: Boca Raton: CRC Press. Kjellin, O. (1997). Svenskt uttal i verkligheten, CD, Hallgren och Fallgren Studieförlag AB, ISBN: 91-7382-750-9. Kruger, F., Stromberg, H., Levitt, H. (1972). Synthetic speech as a diagnostic tool. CSL Research Report No. 2, June. LaRocca, S. (1994). Expoiting strengths and avoiding weakness in the use of speech recognition for language learning, CALICO Journal, 12, 1, 102-105. Larson Education AB, “Lingus”, http://www.larsoneducation.se. Larsson, T. (1997). Föreläsning om “Hörselskador”, Rehabiliteringsteknik AK ht 1997, CERTEC, Lund. 215 Computer-Based Speech Therapy Using Visual Feedback Levitt, H. & Geffner, D. (1987). Communication skills of young hearing-impaired children, Development of language and communication skills in hearing-impaired children, ASHA Monographs, 26, 123-158. Levitt, H. (1987). Interrelationship among the speech and language measures, in Development of language and communication skills in hearing-impaired children. ASHA Monographs, 26, 123158. Levitt, H. (1993). The impact of technology on speech rehabilitation. Proceedings of an ESCA Workshop on Speech and Language Technology for Disabled Persons. Stockholm, Sweden. Lidén, G. (1985). Audiologi, Almqvist & Wiksell. Lindblom, B. and Moon, S-J. (1988). Formant undershoot in clear and citation form speech. Perilus VIII (Phonetic Experimental Research, Institute of Linguistics, University of Stock¬holm), 2032. Ling, D. (1976). Speech and the Hearing-Impaired Child: Theory and Practice. The Alexander Graham Bell Association for the Deaf, Inc. Washington, D.C. 20007, U.S.A. Lotson, A. (2001). Bättre tal med datorstöd – synsinnet kan ersätta skadad hörsel i talträning, Computer Sweden, 34-35. Loughlin (2005). http://users.aber.ac.uk/vil1/approach.htm. Maassen, B. and Powel, D.J (1984). The effect of correcting temporal structure on the intelligibility of deaf speech, Speech Comm. 3, 123-135. Maassen, B. and Powel, D.J (1985). The effect of segmental and suprasegmental corrections on the intelligibility of deaf speech. Journal of the Acoustical Society of America, 78, 877-886. Mahshie, J. J., Herbert, E., Hasegawa, A. (1984). Use of air-flow feedback to modify deaf speaker’s consonant voicing errors. Asha, 26, 10. Mahshie, J. J. (1990). Speech training with deaf children, Seminar held at the Dept. of Speech, Music, and Hearing, KTH, Stockholm, 6 April, 1990. 216 References Mahshie, J. J and Yadab, P. (1990). The Gallaudet University Speech Training and Evaluation System (GUSTES) for deaf children, Asha, 32, 10, 75. Mahshie, J. J. (1995). The Use of Sensory Aids for Teaching Speech to Children who are Deaf, Profound Deafness and Speech Comm., London: Whurr Publ. Ltd., 461-491. Mahshie, J. J. (1996). Feedback considerations for speech training systems, Proceedings of 4th ICSLP–96, 153-156, Philadelphia. Mahshie, J. J. (1998). Balloons, Penguines, and Visual Displays. SpeechViewer III: Solid Tool for Specialists, Perspectives in Education and Deafness, 16, 4. Maltby, M. (2000). A new speech perception test for profoundly deaf children, Deafness and Education International, 2, 2, 86-101. Markides, A. (1985). Type of pure tone audiogram configuration and rated speech intelligibility, Journal of British Association of the Profoundly Hhearing-impaired, 2, 33-36. Markides, A. (1989). Lipreading: Theory and Practice. Journal of Brit. Assn. Teachers of the Deaf, 13:2, 29-47. Mártony, J. (1966). Method of correcting the voice pitch level of hard of hearing subjects, STL-QPSR, 7(2):19-22. Martony, J. (1968). On the correction of the pitch level for severely hard of hearing subjects, American Annals of the Deaf, 113, 2, 195202. Mártony, J., Risberg, A., Agelfors, E., and Boberg, G. (1970). Om talavläsning med elektronisk avläsningshjälp, Intern rapport, Inst. för Talöverföring, KTH. Mártony, J. (1971). Om gravt hörselskadades tal, Fil.lic. avhandling, Inst. för Talöverföring, KTH, Stockholm. Martony, J., Risberg, A., Spens, K-E., Agelfors, E. (1972). Results of a rhyme-test for speech audiometry, Proc. of Int. Symp. on Speech Communication Ability and Profound Deafness, Stockholm, A.G. Bell Association for the Deaf (ed. G. Fant), 75-80. Martony, J. (1974). On a rhyme test, STL-QPSR 2-3, 57-71. Martony, J., Nordström, P-E. (1995). On vowel production in deaf children, VIII Int. Congress of Phonetic Sciences, Leeds,Paper 194. 217 Computer-Based Speech Therapy Using Visual Feedback Massaro, D. W., Light, J. (2004). Using visible speech to train perception and production of speech for individuals with hearing loss, Journal of Speech, Language and Hearing Research, 47,304-320. Mavilya, M. (1972). Spontaneous vocalization and babbling in hearing impaired infants, In G. Fant (Ed) International Symposium on Speech Communication Ability and Profound Profoundly Deafness, Washington, DC. Alexander Graham Bell Association for the Profoundly hearing-impaired, 163-171. McAllister, R. (1986). Tekniska hjälpmedel i Uttalsundervisningen: en delrapport. PU-rapport 1986:1, Stockholms universitet, Pedagogiska utvecklingsenheten. McAllister, R. (1995). Perceptual foreign accent and L2 production, Proc XIIIth Intl Congress of Phonetic Sciences, Stockholm. McAllister, R., Flege, J.E., Piske, T. (2002). The influence of L1 on the acquisition of Swedish quantity by native speakers of Spanish, English and Estonian, Journal of Phonetics, 30, 229-258. McConachie, H. (1990) Early language development and severe visual impairment, Child: care, health and development, 16 (1), 5561. McDonald, E. (1964). Articulation Testing and Treatment, Pittsburgh, Stanwix House, 1964. Merklein, R. A. (1981). A short speech perception test, The Volta Review, 36-46. Miller, G. A. (1951). Language and Communication, McGraw-Hill, New York, p. 39. Mills, A. (1983). The development of phonology in the blind child. Dodd and Rampbell, eds., Hearing by Eye: The Psychology of LipReading, 145-161. Mogford, K. (1983). Lip-reading in the prelingually profoundly hearing-impaired, (Dodd and Rampbell, eds.) Hearing by Eye: The Psychology of Lip-Reading, 191-211. Monsen, R.B. (1976). The production of English stop consonants in the speech of deaf children. Journal of Phonetics, 4, 29-41. Monsen, R.B. (1978). Toward measuring how well hearing-impaired children speak. Journal of Speech and Hearing Research, 21, 197-219. 218 References Monsen; R.B. (1983). Voice quality and speech intelligibility among deaf children. American Annals for the Deaf, 128, 12-19. Moon, S-J. (1991). An acoustic and perceptual study of undershoot in clear and citation-form speech. Perilus XIV, 153-156 (Phonetic Experimental Research, Institute of Linguistics, University of Stockholm). Neri, A., Cucchiarini, C., Strik, H., Boves, L. (2002). The pedagogytechnology interface in computer assisted pronunciation training, Computer Assisted Language Learning, 15, 5, 441-467. Nettelbladt, U. (1983). Developmental studies of dysphonology in children, Doctoral Dissertation. Lund: Lieber Läromedel. Nickerson, R., Stevens, K (1973). Teaching speech to the deaf. Can a computer help? IEEE Trans. Audio Electro-acoustics, AU-21, 445455, 1973. Nickerson, R. S., Kalikow, D. N. & Stevens, K. N. (1976). Computeraided speech training for the deaf, JSHD, 41, 120-132. Oller, D.K., Kelly, C.A. (1974). Phonological substitution processes of a hard-of-hearing child. JSHD, 39, 65-74. Oller, D.K., Eilers, R.E. (1981). A pragmatic approach to phonological systems of deaf speakers. Speech and Language, Advances in basic research and practice, 103-141, Academic press. Oller, D. K, Eilers, R. E, Neal, A. R, Schwartz, H. K, J. (1999). Precursors to speech in infancy: the prediction of speech and language disorders, Commun. Disord. 32, 223-245. Oller, D. K. (2000). The emergence of the speech capacity, Lawrence Erlbaum Associates, Publishers, Mahwah, New Jersey. Osberger, M.J. and Levitt, H. (1979). The Effect of Timing Errors on the Intelligibility of Deaf Children's Speech. JASA, 66:5, 13161324. Osberger, M J., Moeller, M. P. & Kroese, J. M. (1981). Computerassisted speech training for the hearing impaired, J. of the Academy of Rehabilitative Audiologists, 14, 145-158. Osberger, M., J. (1992). Speech intelligibility in the hearing impaired: Research and clinical implications, in Intelligibility in Speech Disorders, Theory, Measurement and management, Edited by 219 Computer-Based Speech Therapy Using Visual Feedback Raymond D: Kent, John Benjamins Publishing Company, Amsterdam, pp.233-265. Öster, A-M. (1985). The Use of a Synthesis-by-Rule-System in a Study of Deaf Speech. STL-QPSR 1/1985, 95-107. Öster, A-M. (1988a). Datorer I talundervisningen – Ett nytt hjälpmedel?, Nordisk Tiskrift för Dövundervisningen, 1, 14-19. Öster, A-M. (1988b). Computer-based speech training, Proceedings of Speech ´88, 7th FASE Symposium, Edingburgh, Book 2, 645-651. Öster, A-M. (1989a). Studies on phonological rules in the speech of the deaf. STL/QPSR 1/89, 159-162. Öster, A-M. (1989b). Applications and experiences of computerbased speech training. STL/QPSR 4/89, 37-44. Öster, A-M. (1989c). Applications and experiences of computer-based speech training, Proceedings of European Conference on Speech Communication and Technology (Eurospeech), Paris, 714-717. Öster, A-M. (1990). The effects of prosodic and segmental deviations on intelligibility of deaf speech, STL/QPSR, 79-88. Öster, A-M. (1991). Phonological assessment of eleven prelingually deaf children's consonant production. STL-QPSR 2-3/91, 11-18. Öster, A-M. (1992a). The speech of deaf children – Phonological assessment as a basis for speech training, Thesis work for the Licentiate Philosophy degree in Phonetics, University of Stockholm, Institute of Linguistics. Öster, A-M. (1992b). Phonological assessment of deaf children's existing articulation skills as a basis for speech training. Proceedings of ICSLP 92, October 12-16, Banff, Alberta, Canada, 955-958. Öster, A-M. (1995a). Principles for a complete description of the phonological system of deaf children as a basis for speech training, Profound Deafness and Speech Comm., London: Whurr Publ. Ltd., 441-461. Öster, A-M. (1995b). Teaching speech skills to deaf children by computer-based speech training, Proceedings of 18th International Congress on Education of the Deaf, Tel-Aviv, Israel. Öster, A-M. (1995c). Teaching speech skills to deaf children by computer-based speech training, STL-QPSR 4/95, 67-75. 220 References Öster, A-M. (1995d). Resultat från datorbaserad röstträning med ett gravt hörselskadat förskolebarn, Nordisk Tidskrift för Dövundervisningen, 125-133. Öster, A-M. (1996). Clinical applications of computer-based speech training for children with hearing impairment. Proceedings of ICSPL 96, 157-160. Philadelphia, USA. Öster, A-M. (1997). Auditory and visual feedback in spoken L2 teaching, Reports from the Dept of Phonetics, Umeå University, PHONUM 4. Öster, A-M. (1998). Spoken L2 teaching with con¬trastive visual and auditory feedback, Proc ICSLP, Sydney. Öster, A-M. (1999a). Strategies and results from spoken L2 teaching with audio-visual feedback, STL-QPSR 1-2/99, 1-7. Öster, A-M., Vicsi, K., Roach, P., Kacic, Z., & Barczikay, P. (1999b). A multimedia multilingual teaching and training system for speech and hearing-impaired children – SPECO, Proceedings of Fonetik 99, 149-152. Gothenburg, Sweden. Öster, A-M. (2002a). The relationship between residual hearing and speech intelligibility - Is there a measure that could predict a prelingually profoundly deaf child's possibility to develop intelligible speech? STL/QPSR Vol. 43, 51-56. Öster, A-M., House, D., Protopapas, A., Hatzis, A. (2002b). Presentation of a new EU project for speech therapy: OLP (Ortho-Logo-Pedia), Proceedings of Fonetik 2002, QPSR, Vol. 44, 45-48. Öster, A-M., House, D., Hatzis, A., Green, P. (2003). Testing a New Method for Training Fricatives using Visual Maps in the OrthoLogo-Paedia Project (OLP), Umeå University, Department of Philosophy and Linguistics, PHONUM 9 (2003), 197-X, Available online at http://www.ling.umu.se/fonetik2003/. Owens, E. and Blazek, B. (1985). Visems observed by hearingimpaired and normal hearing adult viewers, Journal of Speech and Hearing Research, 28, 381-393. Pickett, J. M. and Constam, A. (1968). A visual speech trainer with simplified indication of vowel spectrum, American Annals of the Deaf, 113, 253-258. 221 Computer-Based Speech Therapy Using Visual Feedback Picket, J. M. (1980). Tactual Communication of Speech Sounds to the Deaf: Comparison with Lipreading, IEE Press, Sensory Aids for the Hearing-Impaired, edited by Levitt, H., Pickett, J. M., Houde, R. A, 262-277. Piske, T., MacKay, I., Flege, J. (2001). Factors affecting degree of foreign accent in an L2: A review, Journal of Phonetics, 29, 191– 215. Plant, G. (1960). The Plant-Mandy voice trainer – Some notes by the designer, Teacher of the Deaf, 58, 12-15. Plant, G., and Hammarberg, B. (1983). Acoustic and perceptual Nlysis of the speech of the deafend”, STL-QPSR 2-3/1983 pp. 85107. Povel, D. J., Arends, N. (1991). The Visual Speech Apparatus: Theoretical and Practical Aspects, Speech Communication, 10, 5980. Preisler, G. (1991). Early patterns of interaction between blind infants and their sighted mothers, Child: care, health and development, 17, 65-90. Proctor, A. (1995). Tactile Aid Usage in Young Deaf Children, Profound Deafness and Speech Comm., London: Whurr Publ. Ltd., 111-147, Plant and Spens (Ed.). Pronovost, W., Yenkin, L., Anderson, D.C., Learner, R. (1968). The Voice Visualizer, Ameraican Annals of the Deaf, 113, 230-238. Protopapas, A. (2004). User’s Manual, OLP document QL1971-ILSIN-C-097-a3. Risberg, A. (1968). Visual aids for speech correction, American Annals of the Deaf, 113, 2, 178-194. Risberg, A. & Màrtony, J. (1970). A method for the classification of audiograms. In G. Fant (Ed), Speech Communication Ability and Profound Hearing-Impairedness. Washington, D.C.: A. G. Bell Association for the Profoundly Hearing-impaired, 135-139. Risberg, A. (1976). Diagnostic rhyme test for speech audiometry with severely hard of hearing and profoundly deaf children, STLQPSR 2-3, 40-58. 222 References Risberg, A., Agelfors, E., Florén, Å. (1977). Avläsetest med spondéord, Preliminär rapport, 770323, Inst. för talöverföring, KTH, Stockholm. Risberg, A. (1979). Bestämning av hörkapacitet och talperceptionsförmåga vid svåra hörsel skador, Rapport TRITATLF-79-2, Doktorsavhandling, Inst. för Talöverföring, KTH, Stockholm. Risberg, A. (1982). Speech coding in aids for the deaf: An overview of research from 1924-1982, STL/QPSR 4, 65-98. Rooney, E., Carraro, F., Dempsey, W., Robertson, K., Vaughan, R., & Jack, M. (1994). HARP: an autonomous speech rehabilitation system for hearing-impaired people. In Proc. 1994 International Conference on Spoken Language Processing (ICSLP94), 2019-2022. Yokohama, Japan. Roug, L., Landberg, I.,Lundberg, L-J. (1989). Phonetic development in early infancy: A study of four Swedish children during the first eighteen months of life, Journal of Child Language, 16: 19-40. Saben, C.B. and Ingham, J.C. (1991). The effects of minimal pairs treatment on the speech-sound production of two children with phonologic disorders. Journal of Speech and Hearing Research, Vol.34, 1023-1040. Schulte, K. (1972). Fonator System: Speech stimulator and speech feedback by technically amplified one-channel vibration, 351353, in G. Fant (Ed.), Int. Symposium on Speech Communication Ability and Profound Deafness, A.G. Bell Association for the deaf, Washington. Shriberg, L.D. and Kwiatkowski, J. (1980). Natural Process Analysis (NPA). New York: John Wiley. Sjölander, K., Beskow, J. (2000). WaveSurfer - an Open Source Speech Tool, in Proceedings of ICSLP 2000, Bejing, China. Smith, C. (1975). Residual hearing and speech production in the deaf. Journal of Speech and Hearing Research, 19, 795–811. Soleymani, A. J. A., Southwood, M. H., McCutcheon, M. J. (1997). Design of Speech Illumina Mentor (SIM) for teaching speech to the hearing-impaired, Biomedical Engineering Conference, Proceedings of the 1997 Sixteenth Southern, Biloxi, MS, USA. 223 Computer-Based Speech Therapy Using Visual Feedback Spens, K-E. (1984). To hear with the skin. Dissertation, TRITA_TÖM 2-84, ISSN 0280-9850, KTH. Stach, B. A. (1998). Clinical Audiology: An Introduction, Clifton Park, NY: Singular. Stampe, D. (1979). A Dissertation on Natural Phonology. In Hankamer, I. (ed), Garland, New York. Stoel-Gammon, C. and Dunn, C. (1985). Normal and Disordered Phonology in Children, University Park Press, Baltimore, Md. Stoel-Gammon, C., and Otomo, K. (1986). Babbling development of hearing-impaired and normally hearing subjects, JSHD, Vol.51, 033-041. Summerfield, Q. (1979). Use of Visual Information for Phonetic Perception, Phonetica 36, 314-331. Tanner Dyson, A. (1988). Phonetic inventories of 2- and 3-year-old children, JSHD, Vol.53, 89-93. Thomas, I. B., Snell, R. C. (1970). Articulation training through visual speech Patterns, Volta Review, 310-318. Thorén, B. (1994). Betoningshandboken, Liten hjälpreda för oss som undervisar i svenska som andraspråk, BT Bättre Svenska, 2:a upplagan, Sundsvall. Thore’n, A (2002). Blind children and sighted parents in development and communication, Dissertation, 2002, ISBN 917265-540-2, Stockholm University. Traunmyller, H. (1980). The Sentiphone, a tactual speech communication aid, Journal Comm. Dis., 13, 183-193. Upton, H.W. (1968). Wearable Eyeglass speechreading Aid, American Annals for the Deaf, 113, 22-229. Vicsi, K., Roach, P., Öster A., Kacic, Z., Barczikay, P., Sinka, I. (1999). SPECO – A Multimedia Multilingual Teaching and Training System for Speech Handicapped Children, 6th European Conference on Speech Communication and Technology, Eurospeech ´99, Budapest, 859-862. Vicsi, K., Roach, P., Öster A., Kacic, Z., Barczikay, P., Tantos A., Csatári, F., Bakcsi, Zs., Sfakianaki, A. (2000). A Multimedia Multilingual Teaching and Training System For Speech 224 References Handicapped Children, International, Journal of Speech Technology, Vol. 3, 289-300. Wallace, V., Menn, L., Yoshinaga-Itano, C. (2000). Is babble the gateway to speech for all children? A longitudinal study of children who are deaf or hard of hearing, The Volta Review, Vol, 100 (5) pp. 121-148. Watanabe, A., Ueda, Y., Shigenaga, A. (1985). Color display system for connected speech to be used for the hearing, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol ASSP33, 1, 164-173. Watson, C. S., Reed, D., Kewley-Port, D., and Maki, D. (1989). The Indiana Speech Training Aid (ISTRA) I: Comparisons between human and computer-based evaluation of speech quality, Journal of Speech and Hearing Research, 32, 245-251. Weiner, F.F. (1979). Phonological Process Analysis (PPA), University Park Press, Baltimore, Md. West, J.J. and Weber, J.L. (1973). A phonological analysis of the spontaneous language of a four-year-old, hard-of-hearing child. JSHD, 38, 25-35. Whitehead, R.L. (1983). Some Respiratory and Aerodynamic Patterns in the Speech of the Hearing Impaired. In Hochberg, Levitt, Osberger, eds., Speech of the Hearing Impaired; Research, Training and Personal Preparation, Maryland, MD: University Park Press, 97-116. Wills, D.M. (1981). Some notes on the application of diagnostic profile to young blind children, Psychoanalytic Study of the Child, 36, 217-240. Woodward, M.F. and Barber, C.G. (1960). Phoneme perception in lipreading, Journal of Speech and Hearing Research, 3:3, 212-222. Wundt (1911): ref. by Mills, A. (1983), see above. Yamada, Y. & Murata, N. (1991). Computer Integrated Speech Training Aid, Intern. Symposium on Speech and Hearing Sciences, Osaka, Japan. Yoshinaga-Itano, C., Sedey, A. (2000). Early speech development in children who are deaf or hard of hearing: Interrelationships with 225 Computer-Based Speech Therapy Using Visual Feedback language and speech, Volta Review, Vol.100(5), monograph, 181211. Youdelman, K., Levitt, H. (1991). Speech training of deaf students using a palatographic display, Proc. of First Intern. Symp. on Speech and Hearing Sciences, 1-11, Osaka, Japan. Youdelman, K. (1994). Computer applications in teaching speech to deaf children, Proc. of Second Intern. Symp. on Speech and Hearing Sciences, 67-79, Japan. Ziegler, W., Vogel, M., Teiwes, J., and Ahrndt, T. (1997). Microcomputer-Based Experimentation, Assessment and Treatment, In Ball, M.J., and Code, C. (Eds), Instrumental Clinical Phonetics, Whurr Publishers. Zimmerman, G., and Rettaliata, P. (1981). Articulatory patterns of an adventitiously deaf speaker. Implications for the role of auditory information in speech production, JHSR 24, pp.169-178. 226 Appendices 12. Appendices 12.1. Appendix 1: Diacritics to assess the speech of profoundly hearingimpaired children (see section 7.3 for references) 227 Computer-Based Speech Therapy Using Visual Feedback 228 Appendices 229 Computer-Based Speech Therapy Using Visual Feedback 12.2. Appendix 2: Swedish SAMPA symbols Definitions from: SAMPA home page, UCL Phonetics and Linguistics,University College London http://www.phon.ucl.ac.uk/home/sampa/swedish.htm Consonants There are six plosives: Symbol p b t d k g Word pil bil tal dal kal gås Transcription pi:l bi:l tA:l dA:l kA:l go:s fil vår sil sjuk hal tjock fi:l vo:r si:l S}:k hA:l COk There are six fricatives: f v s S h C There are six sonorant consonants (nasals, liquids and semivowels): m n N r l j mil nål ring ris lös jag mi:l no:l rIN ri:s l2:s jA:g Vowels There are nine long and nine short vowels. Long vowels (followed by short consonant): i: e: E: y: }: 2: u: o: A: 230 vit vet säl syl hus föl sol hål hal vi:t ve:t sE:l sy:l h}:s f2:l su:l ho:l hA:l Appendices Short vowels (followed by long consonant): I e E Y u0 2 U O a vitt vett rätt bytt buss föll bott håll hall vIt vet rEt bYt bu0s f2l bUt hOl hal There are also two pre-r allophones (long and short) of /E/ and /2/ The following important allophonic variants occur in Swedish which require separate symbolic representation: {: 9: { 9 här för herr förr h{:r f9:r h{r f9r @ pojken [email protected] schwa vowel allophone rt rd rn rs rl hjort bord barn fors karl jUrt bu:rd bA:rn fOrs kA:rl pre-r allophone of E: " 2: " E " 2 retroflex consonant, not initial* " " " " * in cases where the dental consonants do not change into retroflexes, they are transcribed using the separator sign (ASCII 45): r-t, r-d. Swedish has two contrasting tonemes, but only in stressed syllables. Tone 1 is indicated by the ordinary stress mark, Tone 2 by a doubled stress mark, e.g. stress and toneme 1 stress and toneme 2 anden anden "[email protected] (the duck) ""[email protected] (the spirit) Note on the use of [S] for orthographic sj etc.: although [S] is an unambiguous way of transcribing this unusual sound of Swedish, some commentators find this symbol phonetically imprecise. Those who feel this way are free to use more elaborate symbols instead: [s`] or even [x\] SAMPA home page, UCL Phonetics and Linguistics home page, University College London home page. 231 Computer-Based Speech Therapy Using Visual Feedback 12.3. 232 Appendix 3: Swedish questionnaire for evaluation of Box of Tricks Appendices 233 Computer-Based Speech Therapy Using Visual Feedback 234 Appendices 235
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
advertisement