Skip to main content

Transcription API

Transcription is a feature of the Agent Assist API, enabling you to convert audio content—such as conversations, interviews, voice memos, or podcasts—into plain text. Once transcribed, you can leverage additional Agent Assist API actions, including summarization or classification, to easily review and extract actionable insights from the transcribed content.

How It Works

  1. Provide audio Content

    • If your audio files are accessible via signed URLs, request signed URLs to upload your files to our storage.
    • If your audio files are already hosted elsewhere and available for download via signed URLs, simply provide those URLs.
  2. Choose Action Types

    Specify the type of actions you want to perform, in addition to the transcription. You can choose from the following action types:

    • transcription: Converts the audio files to text.
    • factual_summary: Provides a summary focusing on objective facts and events from the conversation.
    • detailed_summary: Generates a richer summary, offering more context and nuance from the conversation.
    • technical_summary: Summarizes technical content from the conversation, useful for engineering or product discussions.
    • classification: Classify audio into labels previously created via Classification endpoint

    Each action is independent, and at least one action must be requested. For each action, you can specify metadata and a webhook to receive the result.

  3. Generate Result

    Send a request to the POST transcriptionRequest endpoint. This will initiate the transcription process and return a unique process_cuid, which allows you to monitor the process through other transcription endpoints. Optionally, you can provide an external_ref (to link the process to a specific ID) and set a ttl (time to live for the process). You can also include a list of vocabulary to assist the model in transcribing specific words.

  4. Request Additional Actions

    You can request additional actions by sending a request to POST transcriptionRequest/actions with the relevant action inputs. Ensure that the ttl is set when creating the transcription request in the previous step, and that it has not expired.

  5. Retrieve Result

    Call the GET transcriptionRequest endpoint with the process_cuid to get the status of the process and results for all requested actions. You can also retrieve the result through the specified webhook.

  6. Cancel Process

    If needed, you can cancel the process at any time by sending a request to POST transcriptionRequest/cancel with the corresponding process_cuid.

Continue to explore