The IBM 2006 Spoken Term Location Framework.

Uploaded on:
Category: Funny / Jokes
The IBM 2006 Spoken Term Location Framework Jonathan Mamou IBM Haifa Research Labs Olivier Siohan Bhuvana Ramabhadran IBM T. J. Watson Research Center Diagram Framework depiction Indexing
Slide 1

The IBM 2006 Spoken Term Detection System Jonathan Mamou IBM Haifa Research Labs Olivier Siohan Bhuvana Ramabhadran IBM T. J. Watson Research Center

Slide 2

Outline System portrayal Indexing Audio preparing for every source sort: era of CTM, word perplexity systems (WCN) and telephone transcripts Index era and capacity Search Experiments/Results

Slide 3

Term List System Overview term Phone Transcript Phone Index OOV Posting rundown extricating Merging INDEXER ASR Systems In Voc Scoring Deciding Word Transcript Word Index SEARCHER result OFFLINE INDEXING STD List

Slide 4

Audio Processing

Slide 5

Broadcast News Transcription System (BN)

Slide 6

Conversational Telephone Speech Transcription System (CTS) D. Povey, B. Kingsbury, L. Mangu, G. Saon, H. Soltau, G. Zweig, "fMPE: Discriminatively Trained Features for Speech Recognition", in Proceedings International Conference on Acoustics Speech and Signal Processing, Philadelphia, PA, 2005.

Slide 7

Meeting Transcription System (confmtg) Huang, J. et al, “The IBM RT06S Speech-To-Text Evaluation System", NIST TR06S Workshop, May 3-4, 2006.

Slide 8

Phonetic Lattice Generation O. Siohan, M. Bacchiani, "Fast vocabulary autonomous sound inquiry utilizing way based chart indexing", Proceedings of Interspeech 2005, Lisbon, pp. 53-56. Two-stage calculation: Generate sub-word cross sections utilizing word pieces as interpreting units Convert word-part grids into phonetic cross sections Required assets: A word-section stock A word-part vocabulary A word-piece dialect model Main Issue: outlining a piece stock

Slide 9

Fragment based framework configuration Use a word-based framework to change over the preparation material to telephone strings Train a telephone n-gram with “large n” (say 5) Prune the telephone n-gram utilizing entropy based pruning A. Stolcke, "Entropy-based pruning of backoff languge models", in Proceedings DARPA Broadcast News Transcription and Understanding Workshop, pp. 270-274, Lansdowne, VA, Feb. 1998. Utilize the held n-grams as the chose pieces (n-gram structure guarantees scope of all strings) Phonetic articulations for word parts are trifling Train a section based n-gram model for utilization in the part based ASR framework

Slide 10


Slide 11

Indexing Indices are put away utilizing Juru stockpiling Juru is a full-content pursuit library written in Java, created at IBM D. Carmel, E. Amitay, M. Herscovici, Y. S. Maarek, Y. Petruschka, and A. Soffer, "Juru at TREC 10 - Experiments with Index Pruning", Proceedings of TREC-10, NIST 2001. We have adjusted the Juru stockpiling model so as to store discourse related information (e.g. start time, length of time) The posting records are packed utilizing established file pressure methods (d-crevice) Gerard Salton and Michael J. McGill, Introduction to advanced data recovery, 1983.

Slide 12

Indexing Algorithm Input: a corpus of word/sub-word transcripts Process: 1. Extricate units of indexing from the transcript 2. For every unit of indexing (word or sub-word), store in the file its posting -transcript/speaker identifier (tid) -start time (bt) -length of time -For WCN -back likelihood -rank in respect to alternate theories Output: a record on the corpus of transcripts

Slide 13

Search System

Slide 14

In-Vocabulary Search Miss likelihood can be lessened by growing the 1-best transcript with additional words, taken from alternate choices gave by WCN transcript. Such a development will most likely lessen miss likelihood while expanding FA likelihood ! We require a suitable scoring model keeping in mind the end goal to diminish the FA likelihood by rebuffing “bad” results J. Mamou, D. Carmel and R. Hoory, "Spoken Document Retrieval from Call-focus conversations", Proceedings of SIGIR, 2006

Slide 15

Improving Retrieval Effectiveness for In Voc look Our scoring model depends on two bits of data gave by WCN: the back likelihood of the speculation given the sign: it mirrors the certainty level of the ASR in the theory. the speculation\'s rank among alternate choices: it mirrors the relative significance of the event.

Slide 16

Improving Retrieval Effectiveness with OOV look BN model : 39 OOV inquiries CTS model : 117 OOV questions CONFMTG model: 89 OOV inquiries Since the exactness of telephone transcript is more regrettable than the word\'s precision transcript, we utilize telephone transcript just for OOV catchphrases It has a tendency to diminish miss likelihood without influencing FA likelihood an excessive amount of

Slide 17

Grapheme-to-phoneme change OOV essential words are changed over to telephone grouping utilizing a joint Maximum Entropy N-gram model Given a letter arrangement L, discover the telephone succession P* that amplifies Pr(L,P) with Details in Stanley Chen, “Conditional and Joint Models for Grapheme-to-Phoneme Conversion”, in Proc. of Eurospeech 2003.

Slide 18

Search Algorithm Input: an inquiry term, word based file , sub-word based file Process: 1. Separate the question magic words 2. For In-Voc question pivotal words, separate the posting records from the word based file 3. For OOV question decisive words, change over the catchphrases to sub-words and concentrate the posting rundown of every sub-word from the sub-word file 4. Consolidate the diverse presenting records agreeing on the timestamp of the events keeping in mind the end goal to make results coordinating the question -watch that the words and sub-words show up organized appropriately as indicated by their start times -watch that the words/sub-words are neighboring (less that 0.5 sec for word-word, word-phoneme and under 0.2 sec for phoneme-phoneme) Output: the arrangement of the considerable number of matches of the given term

Slide 19

Search Algorithm Extract Posting List from Word Index Word-Word, Word-Phone: < 0.5s Phone-Phone: < 0.2s In-Voc Extract Terms in the Query Merge taking into account start time and contiguousness Query Extract Posting List from Phone Index OOV Set of matches for all terms in the inquiry

Slide 20

Scoring for hard-choice We have supported the score of various words terms Decision limits are set by investigation of the DET bend got on the advancement set. We have utilized distinctive limit values per source sort

Slide 21

Primary and Contrast framework contrasts Primary framework (WCN) : WCN for every one of the sorts, CONFMTG transcripts created utilizing the BN model. Blend with phonetic 1-best transcripts for BN and CTS. Contrastive 1 (WCN-C) : same as P aside from the WCN of CONFMTG that was created utilizing the CONFMTG model Contrastive 2 (CTM) : CTM for every one of the sorts, CONFMTG transcripts produced utilizing the BN model. Blend with phonetic 1-best transcripts for BN and CTS. Contrastive 3 (1-best-WCN) : 1-best way removed from WCN , CONFMTG transcripts created utilizing the BN model. Mix with phonetic 1-best transcripts for BN and CTS.

Slide 22


Slide 23

Data Results Retrieval exhibitions are enhanced utilizing WCNs, generally to 1-best way utilizing 1-best from WCN than CTM Our ATWV is near the MTWV; we have utilized suitable limits for rebuffing awful results.

Slide 25

Condition execution length of time character when all is said in done we performed better on long terms.

Slide 26

System qualities (Eval) Index size: 0.3267 MB/HP Compression of the file stockpiling Indexing time: 7.5627 HP/HS Search speed: 0.0041 sec.P/HS Index Memory Usage: 1653.4297 MB Search Memory Usage: 269.1250 MB

Slide 27

Conclusion Our framework joins a word recovery approach with a phonetic recovery approach Our work abuses extra data gave by WCNs Extending the 1-best transcript with every one of the speculations of the WCN, considering certainty levels and b oosting by term rank. ATWV is expanded contrasted with the 1-best transcript Miss likelihood is altogether enhanced by indexing every one of the theories gave by the WCN. Choice score are set to NO for “bad” results keeping in mind the end goal to weaken the impact of FA included by WCN.

View more...