Improve organizational diversity, equity, and inclusion initiatives with Amazon Polly
Organizational diversity, equity and inclusion (DEI) initiatives are at the forefront of companies across the globe. By constructing inclusive spaces with individuals from diverse backgrounds and experiences, businesses can better represent our mutual societal needs and deliver on objectives. In the article How Diversity Can Drive Innovation, Harvard Business Review states that companies that focus on multiple dimensions of diversity are 45% more likely to grow their market share and 70% more likely to capture new markets.
DEI initiatives can be difficult and complex to scale, taking long periods of time to show impact. As such, organizations should plan initiatives in phases, similar to an agile delivery process. Achieving small but meaningful wins at each phase can contribute towards larger organizational goals. An example of such an initiative at Amazon is the “Say my Name” tool.
Amazon’s global workforce—with offices in over 30 countries—requires the consistent innovation of inclusive tools to foster an environment that dispels unconscious bias. “Say my Name” was created to help internal Amazon employees share the correct pronunciation of their names and practice saying the name of their colleagues in a culturally competent manner. Incorrect name pronunciation can alienate team members and can have adverse effects on performance and team morale. A study by Catalyst.org reported that employees are more innovative when they feel more included. In India, 62% of innovation is driven by employee perceptions of inclusion. Adding this pronunciation guide to written names aims to create a more inclusive and respectful professional environment for employees.
The following screenshots show examples of pronunciations generated by “Say my Name”.
The application is powered by Amazon Polly. Amazon Polly provides users a text-to-speech (TTS) service that uses advanced deep learning technologies to synthesize natural-sounding human speech. Amazon Polly provides users with dozens of lifelike voices across a broad set of languages, allowing users to select the voice, ethnicity, and accent they would like to share with their colleagues.
In this post, we show how to deploy this name pronunciation application in your AWS environment, along with ways to scale the application across the organization.
The application follows a serverless architecture. The front end is built from a static React app hosted in an Amazon Simple Storage Service (Amazon S3) bucket behind an Amazon CloudFront distribution. The backend runs behind Amazon API Gateway, implemented as AWS Lambda functions to interface with Amazon Polly. Here, the application is fully downloaded to the client and rendered in a web browser. The following diagram shows the solution architecture.
The site allows users to do the following:
Hear how their name and colleagues’ names sound with the different voices of Amazon Polly.
Generate MP3 files to put in email signatures or profiles.
Generate shareable links to provide colleagues or external partners with accurate pronunciation of names.
To deploy the application in your environment, continue following along with this post.
You must complete the following prerequisites to implement this solution:
Install Node.js version 16.14.0 or above.
Install the AWS Cloud Development Kit (AWS CDK) version 2.16.0 or above.
Configure AWS Command Line Interface (AWS CLI).
Install Docker and have Docker Daemon running.
Install and configure Git.
The solution is optimized best to work in the Chrome, Safari, and Firefox web browsers.
Implement the solution
To get started, clone the repository:
The repository consists of two main folders:
/cdk – Code to deploy the solution
/pronounce_app – Front-end and backend application code
We build the application components and then deploy them via the AWS CDK. To get started, run the following commands in your terminal window:
This step should produce the endpoints for your backend services using API Gateway. See the following sample output:
You can now deploy the front end:
This step should produce the URL for your CloudFront distribution, along with the S3 bucket storing your React application. See the following sample output:
You can validate that all the deployment steps worked correctly by navigating to the AWS CloudFormation console. You should see three stacks, as shown in the following screenshot.
To access Say my name, use the value from the FrontendStack.CloudFrontReactAppURL AWS CDK output. Alternatively, choose the stack FrontendStack on the AWS CloudFormation console, and on the Outputs tab, choose the value for CloudFrontReactAppURL.
You’re redirected to the name pronunciation application.
Speech Synthesis Markup Language (SSML) with Amazon Polly. Using SSML-enhanced text gives you additional control over how Amazon Polly generates speech from the text you provide.
For example, you can include a long pause within your text, or change the speech rate or pitch. Other options include:
emphasizing specific words or phrases
using phonetic pronunciation
including breathing sounds
using the Newscaster speaking style
Supported SSML Tags.
Organizations have a responsibility to facilitate more inclusive and accessible spaces as workforces grow to be increasingly diverse and globalized. There are numerous use-cases for teaching the correct pronunciation of names in an organization:
Helping pronounce the names of new colleagues and team members.
Offering the correct pronunciation of your name via an MP3 or audio stream prior to meetings.
Providing sales teams mechanisms to learn names of clients and stakeholders prior to customer meetings.
Although this is a small step in creating a more equitable and inclusive workforce, accurate name pronunciations can have profound impacts on how people feel in their workplace. If you have ideas for features or improvements, please raise a pull request on our GitHub repo or leave a comment on this post.
About the Authors
Aditi Rajnish is a second-year software engineering student at University of Waterloo. Her interests include computer vision, natural language processing, and edge computing. She is also passionate about community-based STEM outreach and advocacy. In her spare time, she can be found playing badminton, learning new songs on the piano, or hiking in North America’s national parks.
Raj Pathak is a Solutions Architect and Technical advisor to Fortune 50 and Mid-Sized FSI (Banking, Insurance, Capital Markets) customers across Canada and the United States. Raj specializes in Machine Learning with applications in Document Extraction, Contact Center Transformation and Computer Vision.
Mason Force is a Solutions Architect based in Seattle. He specializes in Analytics and helps enterprise customers across the western and central United States develop efficient data strategies. Outside of work, Mason enjoys bouldering, snowboarding and exploring the wilderness across the Pacific Northwest.