Conviction overhauling in talked dialog frameworks - PowerPoint PPT Presentation

belief updating in spoken dialog systems l.
Skip this Video
Loading SlideShow in 5 Seconds..
Conviction overhauling in talked dialog frameworks PowerPoint Presentation
Conviction overhauling in talked dialog frameworks

play fullscreen
1 / 40
Download Presentation
kirby-galloway
Views
Download Presentation

Conviction overhauling in talked dialog frameworks

Presentation Transcript

  1. belief updating in spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 acknowledgements Alex Rudnicky, CMU Eric Horvitz & Tim Paek, MSR Antoine Raux

  2. spoken dialog systems • use natural language processing technology • engage in a goal-oriented conversation • research community • information access, command-and-control • personal assistants, taskable agents, tutoring systems • industry • simpler automated phone systems

  3. the problem … misunderstandings lead to interaction breakdowns

  4. more concretely … S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT same PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [for minute since HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: my destination is Birmingham [flight destination mr WEEKEND] S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm arrives Seoul at ………

  5. understanding errors • stem from the speech recognition process • difficult operating conditions • typical word-error-rates • 20-30% • up to 50% for non-natives • strong negative impact on interactions

  6. pathways to increased robustness • gracefully handle errors through conversation • improve recognition • detectthe problems • develop a set of recovery strategies • know how to choose between them (policy)

  7. / 0.07 / 0.72 / 0.65 confidence scores / 0.35 / 0.58 / 0.28 guarding against misunderstandings S: […] What city are you leaving from ? U: [for minute since HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND]

  8. guarding against misunderstandings S: […] What city are you leaving from ? U: [for minute since HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] / 0.07 / 0.72 arrival = {Seoul / 0.65} / 0.65 confidence scores confirmation actions • reject • explicit confirmationDid you say Seoul? • implicit confirmationtraveling to Seoul … What day did you need to travel? • accept / 0.35 / 0.58 / 0.28

  9. departure = { … } arrival = { … } departure = { … } arrival = { … } departure = { … } f departure = { … } departure = { … } arrival = { … } departure = { … } arrival = { … } belief updating S: […] What city are you leaving from ? U: [for minute since HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] / 0.07 / 0.72 arrival = {Seoul / 0.65} / 0.65 confidence scores / 0.35 arrival = ? / 0.58 / 0.28

  10. arrival = {Seoul / 0.65} f / 0.35 arrival = ? belief updating: problem statement S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] • given • an initial belief Binitial(C) over concept C • a system action SA(C) • a user response R • construct an updated belief • Bupdated(C) ← f(Binitial(C), SA(C), R)

  11. outline • related work • proposed approach • data • experiments and results • effects on global performance • conclusion and future work related work : proposed approach : data : experiments and results: global performance : conclusion

  12. ? detecting misunderstandings and corrections • confidence annotation • word-level [Cox, Chase, Bansal, Ravinshankar, etc] • semantic confidence annotation [Walker, San-Segundo, Bohus, etc] • correction detection [Litman, Swerts, Hirschberg, Krahmer, Levow] • detect when the user corrects the system arrival = {Seoul / 0.65} S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] Conf=0.35 Corr=0.47 arrival = ? related work : proposed approach : data : experiments and results: global performance : conclusion

  13. current solutions for tracking beliefs • most systems only track single values • new values overwrite old values • use simple heuristic rules • explicit confirmation S: did you say you wanted to fly to Seoul? • yes → trust hypothesis • no → delete hypothesis • “other” → non-understanding • implicit confirmation S: traveling to Seoul … what day did you need to travel? • rely on new values overwriting old values related work : proposed approach : data : experiments and results: global performance : conclusion

  14. outline • related work • proposed approach • data • experiments and results • effects on global performance • conclusion and future work related work : proposed approach : data : experiments and results: global performance : conclusion

  15. belief updating: problem statement S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] arrival = {Seoul / 0.65} f / 0.35 arrival = ? • given • an initial belief Binitial(C) over concept C • a system action SA(C) • a user response R • construct an updated belief • Bupdated(C) ← f(Binitial(C), SA(C), R) related work : proposed approach : data : experiments and results: global performance : conclusion

  16. YUMA, AZ ALPINE, TX ALPENA, MI ALBANY, NY ABILENE, TX ALLIANCE, NE ABERDEEN, TX ALLAKAKET, AK ALLENTOWN, PA ALEXANDRIA, LA ALBUQUERQUE, NM belief representation Bupdated(C)← f(Binitial(C), SA(C), R) • most accurate representation • probability distribution over the set of possible values departure • however • system “hears” only a small number of conflicting values for a concept throughout a session • max = 3 conflicting values heard • only in 7% of cases, more than 1 value heard related work : proposed approach : data : experiments and results: global performance : conclusion

  17. departure_city [k=3, m=2, n=1] Austin Houston other Boston S: Did you say you were flying from Austin? U: [NO ASPEN] Boston Austin other Ø Aspen Boston Aspen other belief representation • compressed belief representation • khypotheses + other • dynamically add and drop hypotheses • remember m hypotheses, add n new ones (m+n=k) Bupdated(C)← f(Binitial(C), SA(C), R) S: flying from Aspen… what is your destination? U: [NO NO I DIDN’T THAT THAT] • B…(C) is a multinomial variable of degree k+1 related work : proposed approach : data : experiments and results: global performance : conclusion

  18. system action Bupdated(C) ← f(Binitial(C), SA(C), R) related work : proposed approach : data : experiments and results: global performance : conclusion

  19. user response Bupdated(C) ← f(Binitial(C), SA(C), R) related work : proposed approach : data : experiments and results: global performance : conclusion

  20. approach • multinomial regression problem • multinomial generalized linear model • sample efficient • stepwise approach • feature selection • BIC to control over-fitting • one separate model for each system action • Bupdated(C) ← fSA(C)(Binitial(C), R) Bupdated(C) ← f(Binitial(C), SA(C), R) related work : proposed approach : data : experiments and results: global performance : conclusion

  21. outline • related work • proposed approach • data • experiments and results • effects on global performance • conclusion and future work related work : proposed approach : data : experiments and results: global performance : conclusion

  22. data • collected with RoomLine • a phone-based mixed-initiative spoken dialog system • conference room reservation • explicit and implicit confirmations • simple heuristic rules for belief updating • explicit confirm: yes / no • implicit confirm: new values overwrite old ones related work : proposed approach : data : experiments and results: global performance : conclusion

  23. corpus • user study • 46 participants (first-time users) • 10 scenario-based interactions each • corpus • 449 sessions, 8848 user turns • orthographically transcribed • manually annotated • misunderstandings • corrections • correct concept values related work : proposed approach : data : experiments and results: global performance : conclusion

  24. outline • related work • proposed approach • data • experiments and results • effects on global performance • conclusion and future work related work : proposed approach : data : experiments and results : global performance : conclusion

  25. models • k=2 + other (m=1, n=1) • k=3 + other (m=2, n=1) • k=4 + other (m=3, n=1) • full model • all features • basic model • all features except priors and confusability • runtime model • all features available at runtime related work : proposed approach : data : experiments and results : global performance : conclusion

  26. baselines • initialbaseline • accuracy of system beliefs before the update • heuristicbaseline • accuracy of heuristic update rule used by the system • correctionbaseline • accuracy if we knew exactly when the user corrects the system related work : proposed approach : data : experiments and results : global performance : conclusion

  27. implicit confirm 30.8 30.3 30% 30% 26.0 21.5 18.3 20% 20% 16.1 15.8 15.0 10% 10% 6.1 6.2 5.0 5.2 0% 0% i h BM FM RM c i h BM FM RM c request other 98.2 79.7 44.8 12% 45% 9.5 8.6 8% 30% 5.7 5.6 19.3 14.8 14.8 4% 15% 0% 0% i h BM FM RM i h BM FM RM results for k=2 hyps + other explicit confirm initial baseline (i) heuristic baseline (h) basic model (BM) full model (FM) runtime model (RM) correctionbaseline (c) related work : proposed approach : data : experiments and results : global performance : conclusion

  28. a question remains … … does this really matter? related work : proposed approach : data : experiments and results : global performance : conclusion

  29. outline • related work • proposed approach • data • experiments and results • effects on global performance • conclusion and future work related work : proposed approach : data : experiments and results: global performance : conclusion

  30. a new user study … • implemented models in RavenClaw • 40 participants, first-time, non-native users • improvements more likely at high word-error-rates • 10 scenario-driven interactions each • between-subjects; 2 gender-balanced groups • control: RoomLine using heuristic update rules • treatment: RoomLine using runtime models related work : proposed approach : data : experiments and results: global performance : conclusion

  31. 78% 78% treatment control 64% 30% word error rate 16% word error rate effect on task success • logistic ANOVA on task success p=0.009 logit(TaskSuccess) ← 2.09 - 0.05∙WER + 0.69∙Condition 100% 80% probability of task success 60% 40% 20% 0% 0% 20% 40% 60% 80% 100% word error rate related work : proposed approach : data : experiments and results: global performance : conclusion

  32. how about efficiency? • ANOVA on task duration for successful tasks Duration ← -0.21 + 0.013∙WER - 0.106∙Condition • significant improvement • equivalent to 7.9% absolute reduction in word-error p=0.0003 related work : proposed approach : data : experiments and results: global performance : conclusion

  33. outline • related work • proposed approach • data • experiments and results • effects on global performance • conclusion and future work related work : proposed approach : data : experiments and results: global performance : conclusion

  34. f summary U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago departure = { … } arrival = { … } / 0.72 / 0.65 arrival = {Seoul / 0.65} departure = { … } / 0.35 arrival = ? departure = { … } • approach for constructing accurate beliefs • integrate information across multiple turns • large gains in task success and efficiency related work : proposed approach : data : experiments and results: global performance : conclusion

  35. other advantages • learns from data • tuned to the domain in which it operates • sample efficient / scalable • performs a local one-turn optimization • works independently on concepts • portable • decoupled from dialog task specification • no strong assumptions about dialog management related work : proposed approach : data : experiments and results: global performance : conclusion

  36. future work • integrate information from n-best list • integrate other high-level knowledge • domain-specific constraints • inter-concept dependencies • unsupervised / implicit learning • domain-specificity related work : proposed approach : data : experiments and results: global performance : conclusion

  37. thank you! questions …

  38. improvements at different WER absolute improvement in task success word-error-rate

  39. user study • 10 scenarios, fixed order • presented graphically (explained during briefing) • participants compensated per task success

  40. informative features • priors and confusability • initial confidence scores • concept identity • barge-in • expectation match • repeated grammar slots