|
1
|
|
|
2
|
- Pronunciation tuition:
- Use techniques from ASR to decide if a spoken utterance is a ‘good
pronunciation’
- People speak to communicate.
Everybody sounds different, and we don’t worry too much about
‘good’ pronunciation provided
that we get the message
- What is ‘good pronunciation’?
- Reading tuition:
- Use techniques from ASR to track reading against a text and detect
‘incorrectly read words’
- Should the emphasis be accurate reading and pronunciation or fluency
and comprehension?
- What are ‘incorrectly read words’?
|
|
3
|
|
|
4
|
- Computer Aided Pronunciation Learning
- Arabic for non-native speakers
- Application: Correct recitation of holy Qur’an
- Must be recited according to classical Arabic dialect
- Little allowed variation
- Commercial product: HAFSS
- HMM-based verification, with speaker adaptation
- Pre-generation of likely errors, confidence scoring
- Performance
- Detects 62% of pronunciation errors (6.6% of data)
- How is acceptable pronunciation defined?
- Are there other, similar applications?
|
|
5
|
- Computer Aided Pronunciation Learning
- When do pronunciation variants indicate poor reading skills?
- Children with Spanish language background
- Evaluation (“acceptable” vs “unacceptable”):
- 20 human evaluators (varying degrees of Spanish)
- Broad agreement amongst human assessors
- Comparison of:
- Human assessment, automatic threshold based methods, automatic
decision-tree based methods (trained on ‘human voting’ and transcriptions)
- Automatic methods perform about as well as human evaluators
- Transcription good enough to train decision tree
|
|
6
|
- Even though ‘good pronunciation’ is not well defined, we have an
automatic system which is able to replicate subjective human judgements
- By exploring how these classifiers work, can we derive a better
understanding of how people decide between acceptable and unacceptable
pronunciations?
|
|
7
|
- Automatic reading tuition
- Discussion paper: three ASR functions considered:
- Tracking reader’s position in text
- Detecting reading mistakes that a human would correct
- Measuring word reading times
- Subjective conclusion is that all of these can be achieved to some
extent:
- Can be done well enough to detect skips or end of sentence, but not
always well enough to identify which word misread
- Can be done well enough to avoid frustration, detect some mistakes a
human would correct, but not to indicate which words are wrong
- Can be done well enough to estimate fluency
|
|
8
|
- Automatic selection of authentic texts for lexical practice and reading
comprehension in ESL teaching
- REAP system, deployed in English Language Institute (ELI), University of
Pittsburgh
- Documents gathered from web:
- Analysed for occurrence of target vocabulary, syntactic features,
length and readability
- Only about 0.5% of retrieved documents suitable
- Assessment
- Positive user opinions
- Analysis of email correspondence from curriculum supervisor
- Grades to be assigned according to progress with REAP
|
|
9
|
- Problems in automatic pronunciation assessment and reading tuition
expose our lack of real understanding of variability in speech
- Statistical ASR techniques accommodate variability at a local level but
dodge long-term phenomena, such as accent, which provide a framework to
explain this variability
- Need computationally useful models of phenomena such as accent - maybe
STiLL can stimulate research which contributes to better ASR?
|
|
10
|
- If ASR techniques could accurately identify unacceptable pronunciations,
how would we exploit this? What
feedback could we give?
- Unified models for recognition and synthesis?
- Articulatory-based ASR?
|
|
11
|
- Even imperfect STiLL has some serious advantages:
- Patient
- Non-judgemental
- Unlimited time for one-to-one attention
- …
- Thanks to the authors, and to the session organisers
|