Build a Serverless Text-to-Speech Application with Amazon Polly

Build a Serverless Text-to-Speech Application with Amazon Polly

1시간 30분 크레딧 8개

SPL-201 - Version 1.0.15

© 2020 Amazon Web Services, Inc. and its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. All trademarks are the property of their owners.

Corrections, feedback, or other questions? Contact us at AWS Training and Certification.


In general, speech synthesis is not easy. You cannot assume that when an application reads each letter of a sentence, the output will make sense. A few common challenges for text-to-speech applications include:

  • Words that are written the same way, but that are pronounced differently: I live in Las Vegas compared to This presentation broadcasts live from Las Vegas.
  • Text normalization: Disambiguating abbreviations, acronyms, and units: St., which can be expanded as Street or Saint.
  • Converting text to phonemes in languages with complex mapping, such as, in English, tough, through, and though. In this example, similar parts of different words can be pronounced differently depending on the word and context.
  • Foreign words (déjà vu), proper names (François Hollande) and slang (ASAP, LOL).

Amazon Polly provides speech synthesis functionality that overcomes these challenges, allowing you to focus on building applications that use text-to-speech instead of addressing interpretation challenges.

Amazon Polly turns text into life-like speech. It lets you create applications that talk naturally, enabling you to build entirely new categories of speech-enabled products. Amazon Polly is an Amazon AI service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice. It currently includes dozens of lifelike voices in over 20 languages, so you can select the ideal voice and build speech-enabled applications that work in many different countries.

In addition, Amazon Polly delivers the consistently fast response times required to support real-time, interactive dialog. You can cache and save Polly's audio files for offline replay or redistribution. (In other words, what you convert and save is yours. There are no additional text-to-speech charges for using the speech.) Polly is also easy to use. You simply send the text you want to convert into speech to the Amazon Polly API. Amazon Polly immediately returns the audio stream to your application so that your application can play it directly or store it in a standard audio file format such as an MP3.

In this lab you will create a basic, serverless application that uses Amazon Polly to convert text to speech. The application has a simple user interface that accepts text in many different languages and then converts it into audio files that you can play from a web browser. This lab will use blog posts, but you can use any type of text. For example, you can use the application to read recipes while you are preparing a meal, or news articles or books while you are driving or riding a bike.

Application Architecture

You will build a serverless application, which means that you will not need to work with servers — no provisioning, no patching, no scaling. The AWS Cloud automatically takes care of this, allowing you to focus on your application.

The application provides two methods – one for sending information about a new post, which should be converted into an MP3 file, and one for retrieving information about the post (including a link to the MP3 file stored in an Amazon S3 bucket). Both methods are exposed as RESTful web services through Amazon API Gateway.

Application Architecture

When the application sends information about new posts:

1 The information is received by the RESTful web service exposed by Amazon API Gateway. This web service is invoked by a static webpage hosted on Amazon Simple Storage Service (Amazon S3).

2 Amazon API Gateway triggers an AWS Lambda function, New Post, which is responsible for initializing the process of generating MP3 files.

3 The Lambda function inserts information about the post into an Amazon DynamoDB table, where information about all posts is stored.

4 To run the whole process asynchronously, you will use Amazon Simple Notification Service (Amazon SNS) to decouple the process of receiving information about new posts and starting their audio conversion.

5 Another Lambda function, Convert to Audio, is subscribed to your SNS topic and is triggered whenever a new message appears (which means that a new post should be converted into an audio file).

6 The Convert to Audio Lambda function uses Amazon Polly to convert the text into an audio file in the specified language (the same as the language of the text).

7 The new MP3 file is saved in a dedicated S3 bucket.

8 Information about the post is updated in the DynamoDB table. The URL to the audio file stored in the S3 bucket is saved with the previously stored data.

When the application retrieves information about posts:

1 The RESTful web service is deployed using Amazon API Gateway. Amazon API Gateway exposes the method for retrieving information about posts. These methods contain the text of the post and the link to the S3 bucket where the MP3 file is stored. The web service is invoked by a static webpage hosted on Amazon S3.

2 Amazon API Gateway invokes the Get Post Lambda function, which deploys the logic for retrieving the post data.

3 The Get Post Lambda function retrieves information about the post (including the reference to Amazon S3) from the DynamoDB table and returns the information.

Topics covered

By the end of this lab, you will be able to:

  • Create an Amazon DynamoDB to store data
  • Create an Amazon API Gateway RESTful API
  • Create AWS Lambda functions triggered by API Gateway
  • Connect AWS Lambda functions with Amazon Simple Notification Service (SNS)
  • Use Amazon Polly to synthesize speech in a variety of languages and voices

Start Lab

  1. At the top of your screen, launch your lab by choosing Start Lab

This starts the process of provisioning your lab resources. An estimated amount of time to provision your lab resources is displayed. You must wait for your resources to be provisioned before continuing.

If you are prompted for a token, use the one distributed to you (or credits you have purchased).

  1. Open your lab by choosing Open Console

This automatically logs you in to the AWS Management Console.

Do not change the Region unless instructed.

Common Login Errors

Error: Federated login credentials

If you see this message:

  • Close the browser tab to return to your initial lab window
  • Wait a few seconds
  • Choose Open Console again

You should now be able to access the AWS Management Console.

Error: You must first log out

If you see the message, You must first log out before logging into a different AWS account:

  • Choose click here
  • Close your browser tab to return to your initial lab window
  • Choose Open Console again

이 실습의 나머지 부분과 기타 사항에 대해 알아보려면 Qwiklabs에 가입하세요.

  • Amazon Web Services 콘솔에 대한 임시 액세스 권한을 얻습니다.
  • 초급부터 고급 수준까지 200여 개의 실습이 준비되어 있습니다.
  • 자신의 학습 속도에 맞춰 학습할 수 있도록 적은 분량으로 나누어져 있습니다.
이 실습을 시작하려면 가입하세요