The Discourse - PowerPoint PPT Presentation

the speech speech l.
Skip this Video
Loading SlideShow in 5 Seconds..
The Discourse PowerPoint Presentation
The Discourse

play fullscreen
1 / 39
Download Presentation
rana-langley
Views
Download Presentation

The Discourse

Presentation Transcript

  1. The Speech Speech casey chesnut brains-N-brawn.com Madison .NET April 2007

  2. Powerpoint • Page Up • Page Down

  3. brains-N-brawn.com • Pervasive Computing • Tablet PC (MVP 03) • Compact Framework (MVP 04) • Advanced Web Services (MVP 05) • Media Center (MVP 06) • Speech • Location Based Services • Artificial Intelligence • 3D

  4. Outline • Speech Overview • Vista Speech Recognition • SAPI 5.3 / System.Speech • Speech Server 2007

  5. Outline : Speech Overview • Voice User Interface • How does it work? • Synthesis (TTS) • Recognition (SR)

  6. Overview • Speech is just another presentation system • Synthesis = Output to user • Recognition = User input • Voice User Interface (VUI)

  7. VUI Modes • Applications • Multi-modal • Voice-only

  8. VUI Tips • Don't replicate the touch-tone-based menu system • Restrict options on the main (opening) menu to 4 or fewer • Make sure your opening greeting is short • Don't design the app solely for the new user • Focus on task completion above all • What can I say? http://blogs.msdn.com/anandis_thoughts/archive/2006/02/08/528181.aspx

  9. Speech Synthesis • Text to Speech • Dynamic • Prompt database

  10. How Synthesis Works • Text parsing • Sentences, numbers, symbols, pauses • Natural language processing • Part of speech, tense • Phonemes are looked up or sounded out • Diphones are appended together • Post process audio to add emphasis • Play speech audio

  11. Demo /xnaSynth app Article http://www.brains-N-brawn.com/ttSpeech/ http://www.brains-N-brawn.com/xnaSynth/ (codebase from /ttSpeech) How Synthesis Works

  12. Speech Recognition • Speech to Text • Dictation • Command and Control

  13. Audio signal is processed Look for signals which might be speech Phonemes are found in audio signals Phonemes are mapped to a dictionary or words Dictation or grammar-based Apply natural language processing How Recognition Works

  14. How Recognition Works • Demo • /wavReader app • Article • http://www.brains-N-brawn.com/noReco/ • http://www.brains-N-brawn.com/speakerVerify/ (codebase from /noReco)

  15. Built-in to Vista’s shell Microphone bar Language support Can be trained to improve accuracy Command-and-control, also Dictation Automagic application support Horrible Office integration UAC problems Outline : Vista Speech Recognizer

  16. Demo • Say what you see • Show numbers • Correct • Spell it • Mouse grid http://www.istartedsomething.com/20060808/vista-speech-recognition-screencast/

  17. High Risk Demo

  18. Hack http://news.bbc.co.uk/1/hi/technology/6320865.stm • /micBarExtend – tap and talk

  19. Narrator • Vista’s screen reader

  20. Desktop applications SAPI 5.3 System.Speech Outline : SAPI 5.3 / System.Speech

  21. SAPI 5.3 • COM based • Native applications • Managed apps which need more control

  22. System.Speech • Part of .NET 3.0 WPF • Managed wrapper built on SAPI 5.3 • Simple API • Standards support (SSML, SRGS) • Language support • Vista Speech Recognition integration • Does not work in XBAP

  23. System.Speech.Synthesis • SpeechSynthesizer • SSML • PromptBuilder • Voices

  24. System.Speech.Synthesis • Demo • /speechSamples - /speechSynth

  25. System.Speech.Recognition • SpeechRecognizer / SpeechRecognizerEngine • SRGS • GrammarBuilder • Advanced users • Deep-link functionality • Mixed initiative

  26. System.Speech.Recognition • Demo • /speechSamples - /speechReco

  27. System.Speech • Demo • /micBarExtend • /mceSapiMcpl • Article • http://www.brains-N-brawn.com/speechSamples/ • http://www.brains-N-brawn.com/micBarExtend/ • http://www.brains-N-brawn.com/mceSapi/ (not updated for Vista yet)

  28. What about Mobile Devices • OEMs can add VoiceCommand • VoiceCommand is not accessible to developers • WindowsMobile has the SAPI API, but no engines • PlatformBuilder is supposed to have engines • There are 3rd party engines for purchase

  29. Outline : Speech Server 2007

  30. Speech Server 2007 • Telephony Applications • Outgoing calls • Speaker Independent

  31. VOIP Language support VoiceXML / SALT Workflow development model Reports Still in beta Speech Server 2007

  32. Speech Server 2007 • Speech Synthesis • Inline • PromptBuilder • SSML • Prompt databases • Speech Recognition • Inline • Dynamic Grammar • SRGS • Conversational Grammar Builder • DTMF

  33. VoiceXML • Declarative language • Article • http://www.brains-N-brawn.com/vxml/ • http://www.brains-N-brawn.com/myVoices/ • http://www.brains-N-brawn.com/voiceBio/

  34. SALT • Yet another declarative language • Multimodal support has been dropped • Article • http://www.brains-N-brawn.com/noHands/ • http://www.brains-N-brawn.com/speechMulti/ • http://www.brains-N-brawn.com/tabletWeb/ • http://www.brains-N-brawn.com/mceSalt/

  35. Speech Workflow • Speech Sequence Workflow designer • Speech activities • Statement • QuestionAnswer • Debugging tools

  36. Speech Workflow • Demo • /speechTextAdv • /speakerVerify • /mobileRecord • Article • http://www.brains-N-brawn.com/speechTextAdv/ • http://www.brains-N-brawn.com/speakerVerify/

  37. Where • Accessibility • Telephony • Telematics • Home automation • Mobile Devices / Tablets • Gaming • Warehouses • …

  38. Possible Future • Telematics • Service Pack for Office Support • Exchange Server 2007 • Speech Server 2007 release • Rumors that WindowsMobile will get a public API • Dictation has room to improve • Hope that System.Speech will ultimately work in XBAP

  39. Questions