WELCOME TO ISCSLP 2014

第九届中文口语语言处理国际会议

ISCSLP 2014 will be held at Topaz Rooms, Level 2, MAX Atria, Singapore EXPO. Directions to the conference venue can be found here.

The 9th International Symposium on Chinese Spoken Language Processing (ISCSLP 2014) will be held on September 12-14, 2014 in Singapore.

ISCSLP 2014 is a joint conference between ISCA Special Interest Group on Chinese Spoken Language Processing and National Conference on Man-Machine Speech Communication of China.

ISCSLP is a biennial conference for scientists, researchers, and practitioners to report and discuss the latest progress in all theoretical and technological aspects of spoken language processing & prix courrier postal france et international.

While the ISCSLP is focused primarily on Chinese languages, works on other languages that may be applied to Chinese speech and language are also encouraged. The working language of ISCSLP is English.

PRESENTATION INSTRUCTIONS

ORAL SESSION

Tracks A and B are oral presentations, which will be conducted in Room A (Topaz Concourse) and Room B (Topaz 220, 221, 224, 225) of the venue. The duration for each oral presentation is 20 minutes. You are advised to plan for a talk of roughly 15 minutes, leaving 5 minutes for questions.

Track C is NCMMSC Special Sessions, which will be conducted in Room C (Topaz 222, 223). The duration for each presentation is 15 minutes. You are advised to plan for a talk of roughly 12 minutes, leaving 3 minutes for questions.

The rooms for oral presentations will be equipped with a computer running Windows. Microsoft Office 2010 and Abobe Acrobat Reader will be installed on the machine. You are adviced to prepare your slides in Powerpoint or PDF format.

POSTER SESSION

The poster presentations will be conducted in Topaz Foyer of the venue. Each poster session will last for 2 hours. A poster board which measures 194cm wide by 90cm tall will be provided. Please prepare your poster based on this size. Push tacks or velcro adhesive will be provided at the conference to mount your poster to the board.

ISCSLP 2014 PROGRAM AT A GLANCE

Room A: Topaz Concourse Room B: Topaz 220, 221, 224, 225 Room C: Topaz 222, 223

Friday

12 September 2014 Saturday
13 September 2014 Sunday
14 September 2014
Venue Room A Room B Room C Topaz Foyer Room A Room B Room C Topaz Foyer Room A Room B Room C Topaz Foyer
8:00 Registration Registration Registration
9:00 Opening Keynote 2 Keynote 3
Keynote 1
10:00 Coffee Break Coffee Break
Coffee Break A4
(Oral) B4
(Oral) C4
(NCMMSC) P4
(Poster) A5
(Oral) B5
(Oral) C5
(NCMMSC) P5
(Poster)
11:00 A1
(Oral) B1
(Oral) C1
(NCMMSC)

12:00
Lunch Break Lunch Break Lunch Break
13:00

14:00 A2
(Oral) B2
(Oral) C2
(NCMMSC) P2
(Poster) Tutorial 1 Tutorial 2 A6
(Oral) B6
(Oral) C6
(NCMMSC)

15:00

16:00 Coffee Break Coffee Break Coffee Break
A3
(Oral) B3
(Oral) C3
(NCMMSC) P3
(Poster) Tutorial 4 Tutorial 3 SIG-CSLP Assembly
17:00
Closing Ceremony
18:00
Heritage Tours Banquet
19:00

20:00

21:00
22:00
23:00

DAILY SCHEDULE

12 September

Friday, 12 September 2014
Venue Room A
Topaz Concourse Room B
Topaz 220, 221, 224, 225 Room C
Topaz 222, 223 Topaz Foyer
8:00 Registration Registration
9:00 Opening
Keynote 1: Large Scale Neural Network Optimization for Mobile Speech Recognition Applications
Speaker: Dr Michiel Bacchiani, Google, USA
10:00
Coffee Break
11:00 Oral A1: Deep Neural Networks in Speech Recognition - I Oral B1: Keyword Search and Spoken Language Application NCMMSC Special C1 : Emotional Speech Processing

12:00
Lunch Break
13:00

14:00 Oral A2: Language Modelling and Processing Oral B2: Speech Synthesis and Voice Conversion NCMMSC Special C2: Multimodal Observation and Analysis for Speech Production Poster P2: Speech Recognition - II

15:00

16:00 Coffee Break
Oral A3: Speaker Recognition Oral B3: Dialogue System and Language Learning NCMMSC Special C3: (A) Multimodal Observation and Analysis for Speech Production & (B) Front-end Processing for Distant-talking Speech Recognition Poster P3: Prosody and Speech Synthesis
17:00

18:00
Heritage Tours
19:00
20:00
21:00
22:00
23:00

REGISTRATION & FEE PAYMENT

Important notes to All Authors/Presenters

We would like to remind you that accepted papers will be published in the proceedings and scheduled for presentation during the conference, only if at least one of the authors has registered (Full Registration) by the early registration deadline (20 July 2014).
Late payment may result in the paper not being included in the conference program, nor any other publications printed for distribution at the conference.
All Fees must be received in Singapore Dollars Only. All displayed fees include applicable taxes.
Each Paid Fee entitles you to present Up to 2 Papers Maximum.
Please indicate ID of the papers covered by the paid fee.
No registrant may present more than 2 Papers.
ISCSLP Main Conference Fee includes access to the 13-Sep Tutorials.
Registration portal will open on 30 June 2014.

Important Notes to ISCSLP 2014 Invited Guests

The procedure to receive your complimentary registration can be found here; Complimentary registration cannot cover any papers.

ISCSLP2014	Information	Action

The 9th International Symposium on Chinese Spoken Language Processing (ISCSLP2014) is a joint conference between ISCA Special Interest Group on Chinese Spoken Language Processing and National Conference on Man-Machine Speech Communication of China.

ISCSLP2014 will be taking place from September 12 to 14, 2014 in Singapore alongside the Annual Conference of International Speech Communication Association (INTERSPEECH 2014). The main activity of ISCA, INTERSPEECH 2014 will be taking place from September 14 to 18, also at the Singapore EXPO, the conference venue for ISCSLP2014.

Don’t miss this great opportunity to expand your professional network!

More about INTERSPEECH2014 here: http://interspeech2014.org/public.php?page=register.html

*ONE Euro is approximately SGD1.80. ONE US Dollar is approximately SGD1.30.
You may refer to OANDA Currency Converter for estimated currency exchange.
Fees must be received in Singapore Dollars Only. All displayed fees include applicable taxes.

ISCSLP2014
12 – 14 Sep Conference Show/Hide Details
ISCSLP2014
13 Sep Tutorials Show/Hide Details
ISCSLP2014 Add-on Purchases Show/Hide Details
Excursion Tours Show/Hide Details

TERMS

12 – 14 Sep ISCSLP 2014 Conference – Each Paid Fee Includes
Full access to all conference sessions
Free access to the 13-Sep afternoon tutorials
Lunch, AM & PM Coffee/tea with snacks on 12 Sep and 14 Sep
AM & PM Coffee/tea with snacks on 13-Sep
One copy of the Program & Abstract Book
One copy of the Digital Proceedings (CD-ROM, DVD, USB Key, or similar)
Complimentary ticket for ONE PERSON to the 12-Sep ISCSLP2014 Heritage Tour
Complimentary ticket for ONE PERSON to the 13-Sep ISCSLP2014 Conference Banquet
13 Sep Tutorials – Each Paid Fee Includes
Access to the Tutorial/Workshop that you registered for
Coffee/tea break with snacks
Hand outs and materials (if any)
12-Sep ISCSLP2014 Heritage Tours
ISCSLP 2014 Heritage Tours is COMPLIMENTARY with ISCSLP conference registration. Please make your selection at the registration portal.

REGISTRATION POLICY

ISCSLP 2014 offers discounted rate to ISCA Members, IEEE Members, and INTERSPEECH Registrants. To enjoy the discounted rate, you only need to indicate one of the above to identify yourself in the online registration form. If you select Member/Retiree Rates, Membership ID must be provided at the time of registration and payment.

If select Student Rates, you will be required to upload a scanned copy of your student proof at the end of the registration process.

Acceptable documents for student proof include:

Your matriculation card with clear indication of the expiry date.
A proof of student status letter by your school, which should show the following information:
- Student name
- Student matriculation number (student ID)
- Expiry date
- University name
Any fee amount owing to the conference will be recovered online or collected onsite upon check-in at the registration desks.
For walk-in registration, late/payment onsite, standard conference items (e.g. printed program and conference bags) are available only while stocks last.

Cancellation & No Show:

Strictly no refund for “No-Show” registrants and the organizers will not entertain requests to mail out any conference publications and/or materials to such individuals
There will be no refund for cancellation/withdrawal but you may send a substitute

PAYMENT METHODS – PLEASE CHOOSE ONE ONLY

ALL PAYMENTS TO “Meeting Matters International Pte Ltd”
Fees must be received in Singapore Dollars Only

BY CREDIT CARD – China UnionPay/VISA/MasterCard/AMEX Only
01 Sep 2014 will be the last day for processing payment by credit card.
After this date, please register and make payment onsite at the conference.
Note: Authors/Presenters must register and pay (if applicable) by 20 Jul 2014.

BY DEMAND DRAFT/CASHIERS ORDER/TELEGRAPHIC TRANSFER
01 Aug 2014 will be the last day for accepting payment by this method
An additional SGD86 (Eighty Six Singapore Dollars) will be charged to anyone who chooses this payment Method. Please add the amount to your payment.
Note: Authors/Presenters must register and pay (if applicable) by 20 Jul 2014.

Bank Details
Please pay to: Meeting Matters International Pte Ltd
Bank Name: DBS Bank Ltd
Bank Address: 6 Shenton Way, DBS Building, Singapore 068809
Account No: 003-902606-4 (for Singapore Dollars)
Bank Code: 7171
Branch Code: 003
Swift Code: DBSSSGSG
Payable to: Meeting Matters International Pte Ltd
Please include conference name “ISCSLP”, Registrant Name and Registrant ID” in the bank document.
CHEQUES – MUST BE ISSUED BY SINGAPORE BANKS ONLY
01 Aug 2014 will be the last day for accepting payment by this method.
Please mark clearly “ISCSLP”, Registrant Name and Registrant ID on back of the cheque
Please pay to “Meeting Matters International Pte Ltd”.
Note: Authors/Presenters must register and pay (if applicable) by 20 Jul 2014.

CONTACT INFORMATION

For enquiries, please email to both interspeech2014@meetmatt.net and ychan@i2r.a-star.edu.sg.

PAPER SUBMISSION PROCEDURE

All submissions must describe substantial, original, completed and unpublished work. Wherever appropriate, concrete evaluation and analysis should be included.
All submissions should be camera-ready PDF files of up to 5 pages in length. If the paper consists of 5 pages, the last page may only contain references. All papers must conform to the official double-column format in accordance with Interspeech 2014 paper format requirement. The template files may be downloaded from Interspeech 2014 website.
As the reviewing will be blind, manuscripts must not include the authors’ names and affiliations. Authors should ensure that their identities are not revealed in any way in the paper. Self-references that reveal the author’s identity must be avoided.
Submissions will be judged on correctness, originality, technical strength, significance, relevance to the conference, and interest to the attendees. Each submission will be reviewed by three program committee members. Accepted papers will be presented in one of the oral sessions or poster sessions as determined by the program committee.
It is also a condition that at least one author of an accepted paper is registered by the early registration deadline, or the paper will be withdrawn and not published in the proceedings.
Each accepted paper must be presented by one of its authors at the conference. Authors of any paper accepted into the technical program but not presented on-site will be removed from post-conference indexing/archiving and also be blacklisted from future ISCSLP conferences.
The official language of ISCSLP conference is English. However, this year ISCSLP and NCMMSC are jointly organizing ISCSLP-NCMMSC Special Sessions. The ISCSLP-NCMMSC Special Sessions will accept both English and Chinese papers. Accepted papers may be presented in English or Mandarin at the conference. It is strongly recommended the papers be written in English so as to reach a wider range of readers. Chinese papers should come with English translation for title, author’s contacts, and abstract.

Deadline for paper submission: 11 May 2014
Click to submit paper to ISCSLP 2014

ISCSLP-NCMMSC SPECIAL SESSIONS

Download Call for Papers PDF
This year, ISCSLP and NCMMSC jointly organize the following special sessions in the conference.

Computational audio/speech perception
Speech and Language Acquisition
Multimodal Observation and Analysis for Speech Production
Speech Prosody and Language Modeling for Agglutinative Languages
Cognitive Computing
Emotional Speech Processing
Robust Speaker Recognition
Front-end Processing for Distant-talking Speech Recognition
ISCSLP-NCMMSC Special Sessions accept both English and Chinese papers. Chinese paper must come with English abstract. The accepted papers may be presented in English or Mandarin at the conference.

The timeline for ISCSLP-NCMMSC Special Sessions is as follows.

ISCSLP-NCMMSC Special Session paper submission deadline 11 May 2014
Notification of paper acceptance/rejection 22 June 2014
Camera-ready paper due 30 June 2014

HUMAN MACHINE INTERFACES:

BRAIN, EMOTION & SPOKEN LANGUAGE PROCESSING

People are affected by an experience, and can respond to their experience with different emotional styles. Some people can learn a language more efficiently than others, or even acquire and be proficient in many languages. How do we explain a person’s emotional style and language learning capability in association of brain or memory? These may not be new but are still valid and interesting questions to explore. The advancements of affective computing (esp. emotional speech processing technology), and neuromorphic computing, for examples, may have shed some lights in these aspects and are also interesting areas of discussion. These integrated research directions could possibly lead to major breakthrough in the delivery of human-machine expressive spoken language interfaces in the future. This special session will provide a forum to discuss not only psychological development and research progress in affective and cognitive computing, but also challenges in the research and development of multimedia systems and human machine interfaces. There will be a panel discussion which aims to examine and provide insights into the topic. This session encourages young researchers to participate and actively interact and learn from the experts.

Topics of the session will include

Affective neuroscience
Automatic emotion detection from verbal and non-verbal cues
Automatic recognition of user personality, attention, mood in dialog
Challenges in human language processing
Cognitive development and interactive language learning
Expressive language translation
Expressive text-to-speech
Human-centred engineering
Human factor studies
Human robot interaction
Neuromorphic computing
Real time emotion recognition systems
Recognition, synthesis and analysis of human emotional behaviour
Social media and multimedia applications
Timing in discourse
Important Dates

Special Session paper submission deadline 11 May 2014
Notification of paper acceptance/rejection 22 June 2014
Camera-ready paper due 30 June 2014
Organizers:

IEEE Singapore Women in Engineering (WIE) Affinity Group
Swee Lan See, Institute for Infocomm Research, slsee [at]i2r.a-star.edu.sg

CAMERA READY PAPER SUBMISSION

FORMAT OF ENGLISH PAPER

The accepted English papers are required to be IEEE Xplore compliant since they will be indexed and published online on IEEE Xplore. ISCSLP uses the same template as Interspeech. This is also IEEE Xplore compliant. To check whether your paper is IEEE Xplore-compliant, or to create PDF from source file, please follow the following steps:

Log in to the IEEE PDF eXpress Plus site (http://www.pdf-express.org/plus/).
Create an account with our Conference ID: 34150XP
Upload your camera-ready paper for check or conversion.
The result will be sent to you by email.
Save the verified file to your computer and submit via the given link in the acceptance email.

FORMAT OF CHINESE PAPER

The NCMMSC special sessions accepted some Chinese papers. For Chinese papers, please prepare two files: Full paper file (in Chinese) and Abstract file (in English). The Chinese papers should follow NCMMSC format. The abstract should follow English paper format and to be IEEE Xplore-compliant.

REVISION OF PAPER

The Program Committee worked very hard to thoroughly review all the submitted papers. Please repay their efforts by following their suggestions when you revise your paper. Please pay special attention to language and format issues in addition to technical issues. The Program Committee shall go through all the submitted camera ready paper. Papers with unfixed problems will be excluded from proceedings.

The title and authors should be kept the same as the reviewed version. If a change is needed, permission should be sought from Program Committee by email.

SUBMISSION OF PAPER

You may follow the link in the acceptance email to submit your camera ready paper. Kindly take note that the submission deadline is 4 July 2014. The submitted paper must be verified by IEEE PDF eXpress Plus (applicable to English paper and English abstract).

COPYRIGHT TRANSFER FORM

For each paper, a copyright transfer form should be submitted online. Please submit your online copyright form at http://www.iscslp2014.org/public.php?page=copyright.html.

REGISTRATION TO COVER PAPER

Each accepted paper must be covered by a full registration. Each registration can cover at most 2 papers. Registration should be done at the registration page on ISCSLP website. The promotion rate at Interspeech is not for covering papers. Registration should be completed by 20 July 2014. For details, please refer to the registration page at http://www.iscslp2014.org/public.php?page=register.html

IEEE PRESENTATION POLICY

Each accepted paper must be presented by one of its authors at the conference. Authors of any paper accepted into the technical program but not presented on-site will be removed from post-conference indexing/archiving and also be blacklisted from future ISCSLP conferences.

CONTACT

For issues in submitting camera ready paper and copyright form, please contact publication chair, Dr Yanfeng Lu at luyf@i2r.a-star.edu.sg.

PRESENTATION INSTRUCTIONS

ORAL SESSION

POSTER SESSION

KEYNOTE SPEAKERS

Dr Michiel Bacchiani
Google, USA
Large Scale Neural Network Optimization for Mobile Speech Recognition Applications
Recent years have shown a large scale adoption of speech recognition by the public, in particular around mobile devices. Google, with its Android operating system, has integrated speech recognition as a key input modality. The decade of speech that our recognizer processes each day is a clear indication of the popularity of this technology with the public. This talk will describe the current mobile speech applications in more detail. In particular, it will provide a more detailed description of the Deep Neural Network (DNN) technology that is used as the acoustic model in this system and its distributed, asynchronous training infrastructure. Since a DNN is a static classifier, it is ill matched to the speech recognition sequence classification problem. The asynchrony that is inherent to our distributed training infrastructure further complicates the optimization of such models. Our recent research efforts have focused on the optimization of the DNN model, matched to the speech recognition problem. This has resulted in three related algorithmic improvements. First a novel way to bootstrap training of a DNN model. Second the use a sequence rather than a frame-based optimization metric. Third, we have succeeded in applying a recurrent neural network structure to our large scale, large vocabulary application. These novel algorithms have shown effective even in light of the asynchrony in our training infrastructure. The algorithms have reduced the error rate of our system with 10% or more over DNNs well optimized with a frame-based objective. And this trend is holding across all 48 languages where we support speech recognition as an input modality.
Biography: Michiel Bacchiani has been an active speech researcher for over 20 years. Although he has worked in various areas of speech, his main focus has been on acoustic modeling for automatic speech recognition. He currently manages the acoustic modeling team of the speech group at Google. His team is responsible for developing novel algorithms and training infrastructure for the acoustic models for all speech recognition applications backing Google services. These systems include its flagship voice search application which is currently fielded in more than 48 languages. At Google, he previously led the efforts around voicemail transcription fielded in the Google Voice application and led the group that produced the transcription component for the YouTube automatic captioning system.

Before joining Google, Michiel Bacchiani worked as a member of technical staff at IBM Research where he was responsible for the recognition component of the IBM system entered in the European Union funded TC-STAR speech-to-speech translation evaluation. Before that, he was a technical staff member at AT&T; Labs Research where he co-developed the Scanmail voicemail transcription and navigation prototype and developed the transcription system underlying the AT&T; Spoken Document Retrieval system entered in the TREC8 DARPA evaluation.

Michiel Bacchiani received the “ingenieur” (ir.) degree from the Technical University of Eindhoven, The Netherlands and the Ph.D. degree from Boston University, both in electrical engineering. He has authored numerous scientific publications. He is a repeated elected member of the IEEE speech technical committee and has served as a member of various conference and workshop technical committees. He has served as a board member of Speech Communication and has repeatedly served as an area chair for the ICASSP and Interspeech international conferences.
[Back to Top]

Saturday, 13 Sep 2014

Prof Tanja Schultz
Karlsruhe Institute of Technology (KIT), Germany
Multilingual Automatic Speech Recognition for Code-switching Speech
The performance of speech and language processing technologies has improved dramatically over the last years, with an increasing number of systems being deployed in a variety of languages and applications. Unfortunately, recent methods and models heavily rely on the availability of massive amounts of resources which only become available in languages spoken by a large number of people in countries of great economic interest, and populations with immediate information technology needs. Furthermore, todays speech processing systems target monolingual scenarios for speakers who are assumed to use one single language while interacting via voice. However, I believe that today’s globalized world requires truly multilingual speech processing systems which support phenomena of multilingualism such as code-switching and accented speech. As these are spoken phenomena, methods are required which perform reliably even if only few resources are available.

In my talk I will present ongoing work at the Cognitive Systems Lab on applying concepts of Multilingual Speech Recognition to rapidly adapt systems to yet unsupported or under-resourced languages. Based on these concepts, I will describe the challenges of building a code-switch speech recognition system using the example of Singaporean speakers code-switching between Mandarin and English. Proposed solutions include the sharing of data and models across both languages to build truly multilingual acoustic models, dictionaries, and language models. Furthermore, I will describe the web-based Rapid Language Adaptation Toolkit (RLAT, see http://csl.ira.uka.de/rlat-dev) which lowers the overall costs for system development by automating the system building process, leveraging off crowd sourcing, and reducing the data needs without suffering significant performance losses. The toolkit enables native language experts to build speech recognition components without requiring detailed technology expertise. Components can be evaluated in an end-to-end system allowing for iterative improvements. By keeping the users in the developmental loop, RLAT can learn from the users’ expertise to constantly adapt and improve. This will hopefully revolutionize the system development process for yet under-resourced languages.
Biography: Tanja Schultz received her Ph.D. and Masters in Computer Science from University of Karlsruhe, Germany in 2000 and 1995 respectively and passed the German state examination for teachers of Mathematics, Sports, and Educational Science from Heidelberg University, in 1990. She joined Carnegie Mellon University in 2000 and became a Research Professor at the Language Technologies Institute. Since 2007 she is a Full Professor at the Department of Informatics of the Karlsruhe Institute of Technology (KIT) in Germany. She directs the Cognitive Systems Lab, where research activities focus on human-machine interfaces with a particular area of expertise on multilingual speech processing and rapid adaptation of speech processing systems to new domains and languages. She co-edited a book on this subject and received several awards for this work, such as the FZI price for an outstanding Ph.D. thesis in 2001, the Allen Newell Medal for Research Excellence from Carnegie Mellon and the ISCA best paper award in 2002. In 2005 she received the Carnegie Mellon Language Technologies Institute Junior Faculty Chair. Her recent research work on silent speech interfaces based on myoelectric signals received best demo and paper prices in 2006, 2008, 2009, and 2013 and was awarded with the Alcatel-Lucent Research Award for Technical Communication in 2012. Tanja Schultz is the author of more than 280 articles published in books, journals, and proceedings. She regularly serves on many committees and is a member of the Society of Computer Science (GI), the IEEE Computer Society, and the International Speech Communication Association (ISCA), where she currently serves as elected president.
[Back to Top]

Sunday, 14 Sep 2014

Dr Yifan Gong
Microsoft, USA
Selected Challenges and Solutions for DNN Acoustic Modeling
Acoustic modeling with DNN (Deep Neural Networks) has been shown to deliver high speech recognition accuracy on broad range of application scenarios. Increasingly DNN is used in commercial speech recognition products, on either server or device based computing platforms. This creates opportunities for developing algorithms and engineering solutions for DNN-based modeling.

For large scale speech recognition applications, this presentation focuses on several recent techniques to make DNN more effective, including reducing sparseness and run-time cost with SVD based training, improving robustness to acoustic environment with i-vector based DNN modeling, adapting to speakers based on small number of free parameters, increasing language capability by reusing speech training material across languages, parameter tying for multi-style DNN training, reducing word error rate by adding large amount of un-transcribed data, boosting the accuracy of small DNN with behavior transferring training.

The presentation will also identify and elaborate the limitation of current DNN in acoustic modeling, illustrated by experimental results from various applications, and discuss some future directions in DNN for speech recognition.
Biography: Yifan Gong is a Principal Science Manager in the areas of speech modeling core technology, acoustic modeling computing infrastructure, and speech model and language development for Microsoft speech recognition products. His research interests include automatic speech recognition/interpretation, signal processing, algorithm development, and engineering process/infrastructure and management.

He received B.Sc. from the Department of Communication Engineering, Southeast University, China, M.Sc. in electrical engineering and instrumentation from the Department of Electronics, University of Paris, France, and the Ph.D. in computer science from the Department of Mathematics and Computer Science, University of Henri Poincaré, France.

He served the National Scientiﬁc Research Center (CNRS) and INRIA-Lorraine, France, as Research Engineer and then joined CNRS as Senior Research Scientist. As Associate Lecturer, he taught computer programming and digital signal processing at the Department of Computer Science, University of Henri Poincaré. He was a Visiting Research Fellow at the Communications Research Center of Canada. As Senior Member of Technical Staff, he worked for Texas Instruments at the Speech Technologies Lab, where he developed speech modeling technologies robust against noisy environments, designed systems, algorithms, and software for speech and speaker recognition, and delivered memory- and CPU-efﬁcient recognizers for mobile devices. He joined Microsoft in 2004.

Yifan Gong has authored over 130 publications in journals, IEEE Transactions, books, and conferences. His has been awarded over 30 U.S. patents. His specific contribution to the speech recognition includes stochastic trajectory modeling, source normalization HMM training, joint compensation of additive and convolutional noises, variable parameter HMM, and “Speech recognition in noisy environments: A survey” [Speech communication 16 (3), 261-291]. In these areas, he gave tutorials and other invited presentations in international conferences. He has been serving as member of technical committee and session chair for many international conferences, and with IEEE Signal Processing Spoken Language Technical Committees from 1998 to 2002 and since 2013.

TUTORIALS

Saturday, 13 September 2014

The ISCSLP 2014 Organising Committee is pleased to announce the following 4 tutorials presented by distinguished speakers at the conference and will be offered on Saturday, 13 September 2014. All Tutorials will be of two (2) hour duration, and registration fee is free for ISCSLP 2014 delegates.

The tutorial handouts will be provided electronically, ahead of the tutorials. Please download and print at your convenience, as we will not be providing hard copies of these at the conference.

Tutorials

1330 – 1530

ISCSLP-T1

Adaptation Techniques for Statistical Speech Recognition

Kai Yu

ISCSLP-T2

Emotion and Mental State Recognition: Features, Models, System Applications and Beyond

Chung-Hsien Wu, Hsin-Min Wang, Julien Epps and Vidhyasaharan Sethu

1600 – 1800

ISCSLP-T3

Unsupervised Speech and Language Processing via Topic Models

Jen-Tzung Chien

ISCSLP-T4

Deep Learning for Speech Generation and Synthesis

Yao Qian and Frank K. Soong

ISCSLP-T1

Title: Adaptation Techniques for Statistical Speech Recognition
Presenters: Kai Yu (Shanghai Jiao Tong University, Shanghai)

Abstract: Adaptation is a technique to make better use of existing models for test data from new acoustic or linguistic conditions. It is an important and challenging research area of statistical speech recognition. This tutorial gives a systematic review of fundamental theories as well as introduction of state-of-the-art adaptation techniques. It includes both acoustic and language model adaptation. Following a simple example of acoustic model adaptation, basic concepts, procedures and categories of adaptation will be introduced. Then, a number of advanced adaptation techniques will be discussed, such as discriminative adaptation, Deep Neural Network adaptation, adaptive training, relationship to noise robustness etc. After the detailed review of acoustic model adaptation, an introduction of language model adaptation, such as topic adaptation will also be given. The whole tutorial is then summarised and future research direction will be discussed.

Biography: Kai Yu is a research professor in the Computer Science and Engineering Department of Shanghai Jiao Tong University, China. He obtained his Bachelor and Master degrees from Tsinghua University, Beijing, China and his Ph.D. from Cambridge University. He has published over 50 peer-reviewed journal and conference publications on speech recognition, synthesis and dialogue systems. He was a key member of the Cambridge team to build state-of-the-art LVCSR systems in the DARPA funded EARS and GALE projects. He has also managed the design and implementation of large-scale real-world ASR cloud. He is a senior member of IEEE, a member of ISCA and the IET. He was the area chair for speech recognition and processing for INTERSPEECH 2009 and EUSIPCO 2011, the publication chair for IEEE ASRU 2011 and the area chair of spoken dialogue systems for INTERSPEECH 2014. He was selected into the “1000 Overseas Talent Plan (Young Talent)” by Chinese central government in 2012. He was also selected into the Programme for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning.

[Back to Top]

ISCSLP-T2

Title:Emotion and Mental State Recognition: Features, Models, System Applications and Beyond
Presenters: Chung-Hsien Wu (National Cheng Kung University, Tainan City), Hsin-Min Wang (Academia Sinica, Taipei), Julien Epps (The University of New South Wales, Australia) and Vidhyasaharan Sethu (The University of New South Wales, Australia)

Abstract: Emotion recognition is the ability to identify what you are feeling from moment to moment and to understand the connection between your feelings and your expressions. In today’s world, human-computer interaction (HCI) interface undoubtedly plays an important role in our daily life. Toward harmonious HCI interfaces, automated analysis and recognition of human emotion has attracted increasing attention from researchers in multidisciplinary research fields. A specific area of current interest that also has key implications for HCI is the estimation of cognitive load (mental workload), research into which is still at an early stage. Technologies for processing daily activities including speech, text and music have expanded the interaction modalities between humans and computer-supported communicational artifacts.

In this tutorial, we will present theoretical and practical work offering new and broad views of the latest research in emotional awareness from audio and speech. We discuss several parts spanning a variety of theoretical background and applications ranging from salient emotional features, emotional-cognitive models, compensation methods for variability due to speaker and linguistic content, to machine learning approaches applicable to emotion recognition. In each topic, we will review the state of the art by introducing current methods and presenting several applications. In particular, the application to cognitive load estimation will be discussed, from its psychophysiological origins to system design considerations. Eventually, technologies developed in different areas will be combined for future applications, so in addition to a survey of future research challenges, we will envision a few scenarios in which affective computing can make a difference.

Biography: Prof. Chung-Hsien Wu received the Ph.D. degree in electrical engineering from National Cheng Kung University, Tainan, Taiwan, R.O.C., in 1991. Since August 1991, he has been with the Department of Computer Science and Information Engineering, National Cheng Kung University, Taiwan. He became professor and distinguished professor in August 1997 and August 2004, respectively. From 1999 to 2002, he served as the Chairman of the Department. Currently, he is the deputy dean of the College of Electrical Engineering and Computer Science, National Cheng Kung University. He also worked at Computer Science and Artificial Intelligence Laboratory of Massachusetts Institute of Technology (MIT), Cambridge, MA, in summer 2003 as a visiting scientist. He received the Outstanding Research Award of National Science Council in 2010 and the Distinguished Electrical Engineering Professor Award of the Chinese Institute of Electrical Engineering in 2011, Taiwan. He is currently associate editor of IEEE Trans. Audio, Speech and Language Processing, IEEE Trans. Affective Computing, ACM Trans. Asian Language Information Processing, and the Subject Editor on Information Engineering of Journal of the Chinese Institute of Engineers (JCIE). His research interests include affective speech recognition, expressive speech synthesis, and spoken language processing. Dr. Wu is a senior member of IEEE and a member of International speech communication association (ISCA). He was the President of the Association for Computational Linguistics and Chinese Language Processing (ACLCLP) in 2009~2011. He was the Chair of IEEE Tainan Signal Processing Chapter and has been the Vice Chair of IEEE Tainan Section since 2009.

Biography: Dr. Hsin-Min Wang received the B.S. and Ph.D. degrees in electrical engineering from National Taiwan University in 1989 and 1995, respectively. In October 1995, he joined the Institute of Information Science, Academia Sinica, where he is now a research fellow and deputy director. He was an adjunct associate professor with National Taipei University of Technology and National Chengchi University. He currently serves as the president of the Association for Computational Linguistics and Chinese Language Processing (ACLCLP), a managing editor of Journal of Information Science and Engineering, and an editorial board member of International Journal of Computational Linguistics and Chinese Language Processing. His major research interests include spoken language processing, natural language processing, multimedia information retrieval, and pattern recognition. Dr. Wang received the Chinese Institute of Engineers (CIE) Technical Paper Award in 1995 and the ACM Multimedia Grand Challenge First Prize in 2012. He is a senior member of IEEE, a member of ISCA and ACM, and a life member of Asia Pacific Signal and Information Processing Association (APSIPA), ACLCLP, and Institute of Information & Computing Machinery (IICM).

Biography: Dr Julien Epps received the BE and PhD degrees in Electrical Engineering from the University of New South Wales, Australia, in 1997 and 2001 respectively. After an appointment as a Postdoctoral Fellow at the University of New South Wales, he worked on speech recognition and speech processing research firstly as a Research Engineer at Motorola Labs and then as a Senior Researcher at National ICT Australia. He was appointed as a Senior Lecturer in the UNSW School of Electrical Engineering and Telecommunications in 2007 and then as an Associate Professor in 2013. Dr Epps has also held visiting academic and research appointments at The University of Sydney and the A*STAR Institute for Infocomm Research (Singapore). He has authored or co-authored around 150 publications, which have been collectively cited more than 1500 times. He has served as a reviewer for most major speech processing journals and conferences and as a Guest Editor for the EURASIP Journal on Advances in Signal Processing Special Issue on Emotion and Mental State Recognition from Speech. He has also co-organised or served on the committees of key workshops related to this tutorial, such as the ACM ICMI Workshop on Inferring Cognitive and Emotional States from Multimodal Measures (2011), ASE/IEEE Int. Conf. on Social Computing Workshop on Wide Spectrum Social Signal Processing (2012), 4th International Workshop on Corpora for Research on Emotion, Sentiment and Social Signals (Satellite of LREC 2012), Audio/Visual Emotion Challenge and Workshop AVEC 2011 (part of the Int. Conf. on Affective Computing and Intelligent Interaction), AVEC 2012 (part of ACM ICMI) and AVEC 2013 (part of ACM Multimedia). His research interests include applications of speech modelling to emotion and mental state classification and speaker verification.

Biography: Dr Vidhyasaharan Sethu received his BE degree from Anna University, India, and his MEngSc (Signal Processing) degree from the University of New South Wales, Australia. He was awarded his PhD in 2010 for his work on Automatic Emotion Recognition, by the University of New South Wales (UNSW). Following this, he worked as a Postdoctoral Research Fellow at the speech research group at UNSW on the joint modelling of linguistic and paralinguistic information in speech with a focus on emotion recognition. He is currently a Lecturer in Signal Processing at the School of Electrical Engineering and Telecommunications, University of New South Wales, Sydney, Australia. He teaches courses on speech processing, signal processing and electrical system design in the school and is a reviewer for a number of journals including Speech Communication and EURASIP Journal on Audio, Speech and Music Processing and IEEE Transactions on Education. His research interests include emotion recognition, speaker recognition, language identification and the application of machine learning in speech processing.

[Back to Top]

ISCSLP-T3

Title: Unsupervised Speech and Language Processing via Topic Models
Presenters: Jen-Tzung Chien (National Chiao Tung University, Hsinchu)

Abstract: In this tutorial, we will present state-of-art machine learning approaches for speech and language processing with highlight on the unsupervised methods for structural learning from the unlabeled sequential patterns. In general, speech and language processing involves extensive knowledge of statistical models. We require designing a flexible, scalable and robust system to meet heterogeneous and nonstationary environments in the era of big data. This tutorial starts from an introduction of unsupervised speech and language processing based on factor analysis and independent component analysis. The unsupervised learning is generalized to a latent variable model which is known as the topic model. The evolution of topic models from latent semantic analysis to hierarchical Dirichlet process, from non-Bayesian parametric models to Bayesian nonparametric models, and from single-layer model to hierarchical tree model shall be surveyed in an organized fashion. The inference approaches based on variational Bayesian and Gibbs sampling are introduced. We will also present several case studies on topic modeling for speech and language applications including language model, document model, retrieval model, segmentation model and summarization model. At last, we will point out new trends of topic models for speech and language processing.

Biography: Jen-Tzung Chien received his Ph.D. degree in electrical engineering from National Tsing Hua University, Hsinchu, in 1997. During 1997-2012, he was with the National Cheng Kung University, Tainan. Since 2012, he has been with the Department of Electrical and Computer Engineering, National Chiao Tung University (NCTU), Hsinchu, where he is currently a Distinguished Professor. He serves as an adjunct professor in the Department of Computer Science, NCTU. He held the Visiting Researcher positions at the Panasonic Technologies Inc., Santa Barbara, CA, the Tokyo Institute of Technology, Tokyo, Japan, the Georgia Institute of Technology, Atlanta, GA, the Microsoft Research Asia, Beijing, China, and the IBM T. J. Watson Research Center, Yorktown Heights, NY. His research interests include machine learning, speech recognition, information retrieval and blind source separation. He served as the associate editor of the IEEE Signal Processing Letters in 2008-2011, the guest editor of the IEEE Transactions on Audio, Speech and Language Processing in 2012, the organization committee member of the ICASSP 2009, and the area coordinator of the Interspeech 2012. He is appointed as the APSIPA Distinguished Lecturer for 2012-2013. He received the Distinguished Research Award from the National Science Council in 2006 and 2010. He was a co-recipient of the Best Paper Award of the IEEE Automatic Speech Recognition and Understanding Workshop in 2011. Dr. Chien has served as the tutorial speaker for ICASSP 2012 at Kyoto, Interspeech 2013 at Lyon, and APSIPA 2013 at Kaohsiung.

Jen-Tzung Chien (http://chien.cm.nctu.edu.tw)

[Back to Top]

ISCSLP-T4

Title: Deep Learning for Speech Generation and Synthesis
Presenters: Yao Qian and Frank K. Soong (Microsoft Research Asia, Beijing)

Abstract: Deep learning, which can represent high-level abstractions in data with an architecture of multiple non-linear transformation, has made a huge impact on automatic speech recognition (ASR) research, products and services. However, deep learning for speech generation and synthesis (i.e., text-to-speech), which is an inverse process of speech recognition (i.e., speech-to-text), has not generated the similar momentum as it is for ASR yet. Recently, motivated by the success of Deep Neural Networks in speech recognition, some neural network based research attempts have been tried successfully on improving the performance of statistical parametric based speech generation/synthesis. In this tutorial, we focus on deep learning approaches to the problems in speech generation and synthesis, especially on Text-to-Speech (TTS) synthesis and voice conversion.

First, we give a review for the current main stream of statistical parametric based speech generation and synthesis, or the GMM-HMM based speech synthesis and GMM-based voice conversion with emphasis on analyzing the major factors responsible for the quality problems in the GMM-based voice synthesis/conversion and the intrinsic limitations of a decision-tree based, contextual state clustering and state-based statistical distribution modeling. We then present the latest deep learning algorithms for feature parameter trajectory generation, in contrast to deep learning for recognition or classification. We cover common technologies in Deep Neural Network (DNN) and improved DNN: Mixture Density Networks (MDN), Recurrent Neural Networks (RNN) with Bidirectional Long Short Term Memory (BLSTM) and Conditional RBM (CRBM). Finally, we share our research insights and hand-on experience on building speech generation and synthesis systems based upon deep learning algorithms.

Biography: Yao Qian is a Lead Researcher in Speech Group, Microsoft Research Asia. She received her Ph.D in the Dept. of EE, The Chinese University of Hong Kong, in 2005. She joined Microsoft research Asia in September, 2005, right after receiving her PhD. Her research interests are in spoken language processing, including TTS speech synthesis and automatic speech recognition. Her recent research projects include speech synthesis, voice transformation, prosody modeling and Computer-Assisted Language Learning (CALL). She has over 50 publications on international journals and conferences. She also has ten U.S. patent applications, five issued. She has been recognized within Microsoft and in the speech research community for her contributions to TTS and many other speech technologies. She is a senior member of IEEE and a member of ISCA.

Yao Qia (http://research.microsoft.com/en-us/people/yaoqian/)

Biography: Frank K. Soong is a Principal Researcher, Speech Group, Microsoft Research Asia (MSRA), Beijing, China, where he works on fundamental research on speech and its practical applications. His professional research career spans over 30 years, first with Bell Labs, US, then with ATR, Japan, before joining MSRA in 2004. At Bell Labs, he worked on stochastic modeling of speech signals, optimal decoder algorithm, speech analysis and coding, speech and speaker recognition. He was responsible for developing the recognition algorithm which was developed into voice-activated mobile phone products rated by the Mobile Office Magazine (Apr. 1993) as the “outstandingly the best”. He is a co-recipient of the Bell Labs President Gold Award for developing the Bell Labs Automatic Speech Recognition (BLASR) software package.

He has served as a member of the Speech and Language Technical Committee, IEEE Signal Processing Society and other society functions, including Associate Editor of the IEEE Speech and Audio Transactions and chairing IEEE International Workshop. He published extensively with more than 200 papers and co-edited a widely used reference book, Automatic Speech and Speech Recognition- Advanced Topics, Kluwer, 1996. He is a visiting professor of the Chinese University of Hong Kong (CUHK) and a few other top-rated universities in China. He is also the co-Director of the MSRA-CUHK Joint Research Lab. He got his BS, MS and PhD from National Taiwan Univ., Univ. of Rhode Island, and Stanford Univ, all in Electrical Eng. He is an IEEE Fellow.

SINGAPORE & BEYOND

Essential Information 10 Things to Do in Singapore Beyond Singapore

SINGAPORE

A vibrant, multi-cultural, cosmopolitan and sophisticated city-state, Singapore expresses the essence of today’s New Asia. Its many names describe its attributes: City of Diverse Cultures, The Garden City, The Fun City and City for the Arts - these are characteristics that best describe Singapore. The diversity of things to do and see is unrivalled.

ESSENTIAL INFORMATION

People, Language, Culture
One of the most remarkable aspects of Singapore is the truly diverse nature of her population. Established by Thomas Stamford Raffles as a trading post on 29 January 1819, the small sea town of Singapore soon attracted many migrants and merchants and they brought with them their own cultures, languages, customs and festivals. Though intermarriage and integration, Singapore is now a multi-faceted society with a vibrant and diverse cultural heritage, with major ethnic groups being the Chinese, Malays, Indians, Peranakans and Eurasians. Singapore is also home to many expatriates from countries all around the globe.
As a reflection of its collage of cultures, Singapore has adopted English as the representative language for the four major ethnic groups. However, in recognition of the status of the Malay people as the indigenous community in Singapore, the national language of the country is Bahasa Melayu, or the Malay Language.

Weather
Singapore’s weather is hot and humid, with little variation throughout the year. The average daytime temperature is 31℃ (88℉), dropping to around 24℃ (75℉) in the evenings. The monsoon season can bear down pretty heavily on our tropical weather from November onwards, so be prepared for rain on a daily basis during this period.

Electricity and Voltage
The voltage used is 220-240AC, 50 hertz. The plug sockets tend to be 3-pronged pins more often than not, but adaptors are cheap and easy to buy.

Currency
The local currency is the Singapore dollar (S$). The current exchange rate against the US dollar is USD 1 = SGD 1.3 approx. Money changing services can be found at many shopping centres and hotels around the island, and the ATMs accept most main credit cards such as Visa, MasterCard and American Express.

Transport
Singapore has one of the most extensive and efficient public transportation systems in the world. Travelling in the city and suburbs is typically a quick and affordable affair.

Train
The Mass Rapid Transit (MRT) operates an extensive network of trains serving stations all across the island. It is a fast and cost-effective way of getting around Singapore. There are currently four main lines, the North-South Line (Red), East-West Line (Green), North-East Line (Purple), and Circle Line (Orange), with plans for future expansion on the existing MRT system. Some MRT Stations are linked to several bus interchange stations, making it even more convenient to continue your journey to areas in Singapore not covered but the MRT.

For the MRT Network Map, click here

Bus
Also conveniently available are Singapore’s public bus services. There are currently more than 300 bus services which run daily from 5.30 to midnight, even covering destinations that are less available. The two main bus service providers in Singapore are SBS Transit and SMRT Buses.

For more information on SBS Transit’s transport services, click here
For more information on SMRT Buses’ transport services, click here

Taxi
Taxis ply the island round the clock, bringing you wherever you want, anytime you want. However, do note that peak-hour, city area and ERP gantry surcharges apply. There are numerous taxi stands available island-wide, as well as in hotels and shopping centres.
Here are the hotlines for various taxi services in Singapore, which will come in handy if you end up in a more obscure part of the island where the traffic is sparse, or if you are unable to find a taxi during peak hours or any other reason. If you’re coming from an entertainment or dining venue, most concierge services will also be happy to call a taxi for you.

The city’s major cab companies are:

Comfort Transportation Pte Ltd / CityCab Pte Ltd: (65) 6552 1111
Premier Taxis Pte Ltd: (65) 6476 8880
Maxi Cab: (65) 6535 3534
SMRT Taxis: (65) 6555 8888
For more travellers’ essential information, click here

NEWS

30 July Announcement of Program at a Glance
30 July Announcement of Program by Day
30 July Announcement of Technical Program
17 July Announcement of ISCSLP 2014 keynote speakers
10 June Paper submission to special session Advances in Human Language Technologies is now open.
20 May Announcement of ISCSLP 2014 tutorials
20 May Paper submission is closed.
28 Apr ISCSLP 2014 papers will be included in IEEE Xplore Digital Library.
11 Feb Announcement of Call For Demos
23 Jan Announcement of Call For Papers (PDF)
23 Jan Announcement of Call For Tutorials
23 Jan Announcement of Call For Special Sessions
17 Dec Launch of ISCSLP 2014 official website!

KEYNOTE SPEAKERS

The ISCSLP 2014 Organising Committee is pleased to announce 3 keynote sessions presented by distinguished speakers at the conference. Details can be found here.

COPYRIGHT SUBMISSION

For each paper, a copyright transfer form should be submitted to IEEE online. Please submit your online copyright form here.

IMPORTANT DATES

Regular and special session paper submission deadline 11 May 2014
Notification of paper acceptance/rejection 26 June 2014
Camera-ready paper due 4 July 2014
Author’s registration deadline 20 July 2014
Conference dates 12-14 Sept 2014
IEEE XPLORE DIGITAL LIBRARY
ISCSLP 2014 is now included in the IEEE’s Conference Publication Program. ISCSLP 2014 papers will be included in IEEE Xplore Digital Library.

WELCOME TO ISCSLP 2014

第九届中文口语语言处理国际会议

PRESENTATION INSTRUCTIONS

ORAL SESSION

POSTER SESSION

ISCSLP 2014 PROGRAM AT A GLANCE

DAILY SCHEDULE

12 September

REGISTRATION & FEE PAYMENT

Important notes to All Authors/Presenters

Important Notes to ISCSLP 2014 Invited Guests

TERMS

REGISTRATION POLICY

PAYMENT METHODS – PLEASE CHOOSE ONE ONLY

CONTACT INFORMATION

PAPER SUBMISSION PROCEDURE

ISCSLP-NCMMSC SPECIAL SESSIONS

HUMAN MACHINE INTERFACES:

BRAIN, EMOTION & SPOKEN LANGUAGE PROCESSING

CAMERA READY PAPER SUBMISSION

FORMAT OF ENGLISH PAPER

FORMAT OF CHINESE PAPER

REVISION OF PAPER

SUBMISSION OF PAPER

COPYRIGHT TRANSFER FORM

REGISTRATION TO COVER PAPER

IEEE PRESENTATION POLICY

CONTACT

PRESENTATION INSTRUCTIONS

ORAL SESSION

POSTER SESSION

KEYNOTE SPEAKERS

TUTORIALS

SINGAPORE & BEYOND

SINGAPORE

ESSENTIAL INFORMATION

NEWS

KEYNOTE SPEAKERS

COPYRIGHT SUBMISSION

IMPORTANT DATES

ORGANIZERS

DIAMOND SPONSORS