Transcription API
Transcription is a feature of the Agent Assist API, enabling you to convert audio content—such as conversations, interviews, voice memos, or podcasts—into plain text. Once transcribed, you can leverage additional Agent Assist API actions, including summarization or classification, to easily review and extract actionable insights from the transcribed content.
How It Works
-
Provide audio Content
- If your audio files are accessible via signed URLs, request signed URLs to upload your files to our storage.
- If your audio files are already hosted elsewhere and available for download via signed URLs, simply provide those URLs.
-
Choose Action Types
Specify the type of actions you want to perform, in addition to the transcription. You can choose from the following action types:
transcription: Converts the audio files to text.factual_summary: Provides a summary focusing on objective facts and events from the conversation.detailed_summary: Generates a richer summary, offering more context and nuance from the conversation.technical_summary: Summarizes technical content from the conversation, useful for engineering or product discussions.classification: Classify audio into labels previously created via Classification endpoint
Each action is independent, and at least one action must be requested. For each action, you can specify metadata and a webhook to receive the result.
-
Generate Result
Send a request to the
POST transcriptionRequestendpoint. This will initiate the transcription process and return a uniqueprocess_cuid, which allows you to monitor the process through other transcription endpoints. Optionally, you can provide anexternal_ref(to link the process to a specific ID) and set attl(time to live for the process). You can also include a list ofvocabularyto assist the model in transcribing specific words. -
Request Additional Actions
You can request additional actions by sending a request to
POST transcriptionRequest/actionswith the relevant action inputs. Ensure that thettlis set when creating the transcription request in the previous step, and that it has not expired. -
Retrieve Result
Call the
GET transcriptionRequestendpoint with theprocess_cuidto get the status of the process and results for all requested actions. You can also retrieve the result through the specified webhook. -
Cancel Process
If needed, you can cancel the process at any time by sending a request to
POST transcriptionRequest/cancelwith the correspondingprocess_cuid.