Build a Serverless Text-to-Speech Application with Amazon Polly

Build a Serverless Text-to-Speech Application with Amazon Polly

1 hour 30 minutes 8 积分

SPL-201 - Version 1.0.12

© 2020 Amazon Web Services, Inc. and its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited.

Errors, corrections or other questions? Contact us at AWS Training and Certification.


In general, speech synthesis is not easy. You cannot assume that when an application reads each letter of a sentence, the output will make sense. A few common challenges for text-to-speech applications include:

  • Words that are written the same way, but that are pronounced differently: I live in Las Vegas compared to This presentation broadcasts live from Las Vegas.
  • Text normalization: Disambiguating abbreviations, acronyms, and units: St., which can be expanded as Street or Saint.
  • Converting text to phonemes in languages with complex mapping, such as, in English, tough, through, and though. In this example, similar parts of different words can be pronounced differently depending on the word and context.
  • Foreign words (déjà vu), proper names (François Hollande) and slang (ASAP, LOL).

Amazon Polly provides speech synthesis functionality that overcomes these challenges, allowing you to focus on building applications that use text-to-speech instead of addressing interpretation challenges.

Amazon Polly turns text into life-like speech. It lets you create applications that talk naturally, enabling you to build entirely new categories of speech-enabled products. Amazon Polly is an Amazon AI service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice. It currently includes dozens of lifelike voices in over 20 languages, so you can select the ideal voice and build speech-enabled applications that work in many different countries.

In addition, Amazon Polly delivers the consistently fast response times required to support real-time, interactive dialog. You can cache and save Polly's audio files for offline replay or redistribution. (In other words, what you convert and save is yours. There are no additional text-to-speech charges for using the speech.) Polly is also easy to use. You simply send the text you want to convert into speech to the Amazon Polly API. Amazon Polly immediately returns the audio stream to your application so that your application can play it directly or store it in a standard audio file format such as an MP3.

In this lab you will create a basic, serverless application that uses Amazon Polly to convert text to speech. The application has a simple user interface that accepts text in many different languages and then converts it into audio files that you can play from a web browser. This lab will use blog posts, but you can use any type of text. For example, you can use the application to read recipes while you are preparing a meal, or news articles or books while you are driving or riding a bike.

Application Architecture

You will build a serverless application, which means that you will not need to work with servers — no provisioning, no patching, no scaling. The AWS Cloud automatically takes care of this, allowing you to focus on your application.

The application provides two methods – one for sending information about a new post, which should be converted into an MP3 file, and one for retrieving information about the post (including a link to the MP3 file stored in an Amazon S3 bucket). Both methods are exposed as RESTful web services through Amazon API Gateway.

Application Architecture

When the application sends information about new posts:

1 The information is received by the RESTful web service exposed by Amazon API Gateway. This web service is invoked by a static webpage hosted on Amazon Simple Storage Service (Amazon S3).

2 Amazon API Gateway triggers an AWS Lambda function, New Post, which is responsible for initializing the process of generating MP3 files.

3 The Lambda function inserts information about the post into an Amazon DynamoDB table, where information about all posts is stored.

4 To run the whole process asynchronously, you will use Amazon Simple Notification Service (Amazon SNS) to decouple the process of receiving information about new posts and starting their audio conversion.

5 Another Lambda function, Convert to Audio, is subscribed to your SNS topic and is triggered whenever a new message appears (which means that a new post should be converted into an audio file).

6 The Convert to Audio Lambda function uses Amazon Polly to convert the text into an audio file in the specified language (the same as the language of the text).

7 The new MP3 file is saved in a dedicated S3 bucket.

8 Information about the post is updated in the DynamoDB table. The URL to the audio file stored in the S3 bucket is saved with the previously stored data.

When the application retrieves information about posts:

1 The RESTful web service is deployed using Amazon API Gateway. Amazon API Gateway exposes the method for retrieving information about posts. These methods contain the text of the post and the link to the S3 bucket where the MP3 file is stored. The web service is invoked by a static webpage hosted on Amazon S3.

2 Amazon API Gateway invokes the Get Post Lambda function, which deploys the logic for retrieving the post data.

3 The Get Post Lambda function retrieves information about the post (including the reference to Amazon S3) from the DynamoDB table and returns the information.

Topics covered

By the end of this lab, you will be able to:

  • Create an Amazon DynamoDB to store data
  • Create an Amazon API Gateway RESTful API
  • Create AWS Lambda functions triggered by API Gateway
  • Connect AWS Lambda functions with Amazon Simple Notification Service (SNS)
  • Use Amazon Polly to synthesize speech in a variety of languages and voices

Start Lab

  1. At the top of your screen, launch your lab by clicking Start Lab

This will start the process of provisioning your lab resources. An estimated amount of time to provision your lab resources will be displayed. You must wait for your resources to be provisioned before continuing.

If you are prompted for a token, use the one distributed to you (or credits you have purchased).

  1. Open your lab by clicking Open Console

This will automatically log you into the AWS Management Console.

Please do not change the Region unless instructed.

Common login errors

Error : Federated login credentials

If you see this message:

  • Close the browser tab to return to your initial lab window
  • Wait a few seconds
  • Click Open Console again

You should now be able to access the AWS Management Console.

Error: You must first log out

If you see the message, You must first log out before logging into a different AWS account:

  • Click click here
  • Close your browser tab to return to your initial Qwiklabs window
  • Click Open Console again

Join Qwiklabs to read the rest of this lab...and more!

  • Get temporary access to the Amazon Web Services Console.
  • Over 200 labs from beginner to advanced levels.
  • Bite-sized so you can learn at your own pace.
Join to Start This Lab