Automated Transcription Service

Our Automated Transcription Service is a collaboration between the Social Science Research Commons, UITS Research Technologies, and the SecureMyResearch team within the Center for Applied Cybersecurity Research. We designed this service to provide social science researchers with access to automated transcription services that are approved for use with research data and without requiring special technical skills. Our service has been approved by the Data Stewards for use with critical research data, including research data containing PHI.

This service is available to IU researchers at no cost. Read more about the service and submit a new project request below.

Researchers first submit a new project request form. Once approved, we send instructions for securely sending audio files to the ATS Manager. We recommend that researchers first send one or two files and evaluate whether the transcripts generated will meet their needs.

The ATS Manager then submits these audio files for transcription (see "About the Technology," below).

Written transcripts are securely returned to researchers by the ATS Manager. Most transcripts are in Word document format, with time stamps, speaker recognition, and color-coding based on word confidence scores to aid in human review. For some languages, though, we can only provide transcripts in plain text format that may or may not include speaker recognition (see "Language support," below).

Our service is based primarily on Amazon Transcribe; we also use Google Cloud Speech-to-Text to provide support for additional languages. We use these services through existing IU contracts and have obtained special provisional approval from IU's data stewards for our service to be used with research data and PHI. We created this service to make it possible for researchers to take advantage of these services without having to obtain, manage, and seek approval for their own cloud computing accounts through IU's contracts, and without having to work directly with output in JSON format.

Amazon Transcribe currently supports a range of languages; these are listed in a drop-down menu in our project request form, and a full, up-to-date list from Amazon is also available on their website.

If a language is not supported by Amazon Transcribe, we can sometimes provide transcripts through Google Cloud. View a list of languages supported by Google Cloud Speech-to-Text on Google's website; note that for languages without support for speaker diarization, we are not able to provide transcripts that include speaker recognition.

Both Amazon and Google are increasing their language support; check the links above for the most current information.

We have generated several English-language sample transcripts that are available to view. We are grateful to the IU Bicentennial Oral History Project and the IU University Archives for providing us with several audio files used to generate the transcripts. You can listen to and read transcripts for hundreds of oral histories from the project at oralhistory.iu.edu

Willis, Martha, June 21, 2010. Indiana University Bicentennial Oral History Project, IU Libraries University Archives, Bloomington.

Listen to the interview
View raw transcript produced by ATS

IU researchers have access to other automated transcription services that are approved for use with research data, including Microsoft Transcribe, Microsoft Teams transcription, and Zoom closed captioning. Read about how to use these services in the SecureMyResearch Cookbook Recipe, "Generate transcripts for study participant interviews."

Requirements and Instructions

Researcher Expectations

  • Researchers are responsible for meeting requirements for collecting and storing research data at IU, including obtaining IRB approval and storing data appropriately.
  • Researchers will be asked to periodically provide information on research outputs supported by this service (e.g. publications, presentations, grant applications, etc.). This information will only be used to assess and report on the impact of this service, and to make it possible for us to continue to offer it to researchers. 
  • Researchers must also follow the requirements below. Failure to follow these requirements may result in researchers losing access to the ATS service.

Researchers must prepare all files as follows:

  • Each file name should start with the IU username of the submitter, followed by an underscore. For example, if your IU username is jdoe, all file names should start with jdoe_.
  • Other than the IU username, file names must not contain identifying information, such as respondent or research site names.

If you need instructions on how to efficiently batch rename multiple files, please let us know.

If you will be sending us your files via Secure Share (see below), you will also need to compress the files into a single .zip or similar format.

Audio files must be transferred securely to the ATS Manager.

  • We recommend using Secure Share (https://secureshare.iu.edu/). Instructions for uploading and sharing files are available in the IU Knowledge Base. To efficiently send multiple files, compress the files to a .zip file or similar format.

  • We can also accept files through a secure Microsoft Teams. If you prefer to use this option, let us know; we can set up a secure folder to share with you, or you can temporarily share a folder on your own existing secure storage with the ATS Manager. Audio files will be deleted from this secure folder once they have been transferred.

Transcripts will be securely returned to researchers using the same method the researcher used to transfer audio files (e.g., Secure Share or Microsoft Secure Storage at IU). Our goal is to return files within one business day.

  • When transferring via Secure Share, you will receive an automated link from Secure Share, and will separately be sent an encryption password by the ATS Manager. If you have trouble clicking on the link and accessing the secure file download, try copying and pasting the link into a browser window.
  • When transferring via a secure Microsoft Team, we will deposit transcript files in the same folder that was used to transfer audio files to us. Researchers will receive an automated notice that new files are available. Once the transcript files have been retrieved, they should be deleted from this shared folder.

Team and Acknowledgments

Our Automated Transcription Service was built by Alan Walsh, Philip Berg, and Rebecca Haussin. Secure workflows were designed in collaboration with Will Drake, Tim Daniel, and Anurag Shankar. Emily Meanwell serves as the ATS Manager. 

We are grateful to the Indiana University Bicentennial Oral History Project and the IU Libraries University Archives, Bloomington, for the oral histories we use to develop and demonstrate this service, with special thanks to Kristin Browning Leaman, Rafal Swiatkowski, Carrie Schwier, and the participants of the Bicentennial Oral History Project.