Skip to main content

Quick Start for Transcription

This guide will walk you through using the Transcription endpoint of the Agent Assist API. This feature allows you to transcribe audio recordings of conversations between agents and customers into text for further analysis such as audio summarization. Transcription is processed asynchronously — you submit audio files and retrieve the result later using a process_cuid, or via a webhook pre-defined.

💡 For all the following endpoints, Make sure to replace YOUR_BEARER_TOKEN, {companyCuid}, and {orchestratorCuid} with your actual values.


Step 0: Upload and Share Your Audio

Option 1
Upload your audio files to a storage service and generate signed URLs that are publicly accessible, allowing our system to download and process the audio.

Option 2 If your audio is not yet stored, request signed URLs for uploading your files through the signedUrls endpoint. This endpoint will provide the requested number of signed URLs along with a process_cuid, which will be used to track the transcription process.

⚠️ Warning: If you choose this option, make sure to upload your audio files using the returned signed URLs, in the exact order they were provided.

Step 1: Prepare the Request

To initiate a transcription process, follow these steps:

  • If you uploaded your audio files via our API, provide the process_cuid returned. Otherwise provided the signed URLs of your audio files. Each file must be under 23MB and in a supported format such as mp3 or wav. These URLs will be used to download the audio files for processing.

  • Define the list of actions you want to perform. Available actions are:

    • transcription
    • factual_summary
    • detailed_summary
    • technical_summary

    For each action, you can optionally configure a webhook to be notified upon completion. The webhook configuration supports:

    • url (required): endpoint to receive the result
    • headers (optional): headers to user when sending result to webhook
    • external_metadata (optional): any custom metadata you'd like to include in the webhook payload
  • Optionally provide a vocabulary list to improve recognition of specific or domain-specific terms.

  • Set a ttl (Time To Live) value in minutes. This determines how long the process remains available after the first action completes, allowing you to submit additional actions without re-processing the audio. Default is 0 (no persistence).

  • You can also include an optional external_ref to link the process to your own internal identifiers.

Endpoint: POST /api/companies/{companyCuid}/orchestrators/{orchestratorCuid}/transcription

Example cURL Request with Payload:

curl -X POST https://api.dialonce.ai/agent-assist/v1/companies/{companyCuid}/orchestrators/{orchestratorCuid}/transcription \
-H "Authorization: Bearer YOUR_BEARER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"audio_files": [
"https://storage.googleapis.com/my-bucket/audio-123.mp3"
],
"actions": [
{
"action_type": "factual_summary",
"webhook": {
"url": "https://client-webhook.example.com/receive-summary",
"headers": {
"access_token": "abc123xyz"
}
"external_metadata": {
"conversation_id": "conv-789",
"source": "crm-system",
"status": "new"
}
}
}
],
"vocabulary": ["DalirSicx", "metAl Plus"],
"ttl": 0,
"external_ref": "1234-abcd-56ef-7891"
}'

Step 2: Get the Status Request

Once your transcription request is submitted, you'll receive a process_cuid in the response. This identifier allows you to check the status and retrieve results for all actions (e.g., transcription, summaries) associated with the process.

You can poll the following endpoint using your process_cuid to get the current status and outputs of the requested actions.

Endpoint: GET /api/companies/{companyCuid}/orchestrators/{orchestratorCuid}/transcriptionRequests/{processCuid}

Example cURL:

curl -X GET https://api.dialonce.ai/agent-assist/v1/companies/{companyCuid}/orchestrators/{orchestratorCuid}/transcriptionRequests/{processCuid} \
-H "Authorization: Bearer YOUR_BEARER_TOKEN"

Response: If the request is successful, the response includes the status and results of each action.

{
"process_cuid": "abcd-1234-efgh-5678",
"created_at": "2025-01-01T12:00:00Z",
"updated_at": "2025-01-01T14:00:00Z",
"status": "started",
"step": "transcribing",
"vocabulary": "DalirSicx, metAl Plus, MeltiStake, TFGC, Q.U.A.R.T.Z.",
"external_ref": "1234-abcd-56ef-7891",
"actions": [
{
"action_type": "factual_summary",
"created_at": "2025-01-01T12:15:00Z",
"status": "pending",
"error": null,
"result": null,
"webhook": "https://example.com/webhook/factual-summary",
"external_metadata": {
"conversation_id": "afvvf67890dfsf",
"source": "workflow-manager",
"status": "in-progress"
}
}
]
}

You can continue polling this endpoint until all requested actions are marked as finished. Alternatively, if you’ve configured a webhook, results will also be sent directly to your endpoint when ready. You will have access to the results of your process during a period of 24 hours.

Step 3: Get results

If you filled the webhook parameter for a specific action, a POST request will be sent to this webhook with the result in the following format:

{
"process_cuid": "abcd-1234-efgh-5678", // Unique identifier of the process
"action_type": "factual_summary", // Type of action performed (e.g., "transcription", "summarization", etc.)
"success": true, // Whether the action completed successfully
"data": { "markdown": "**texte**", "html": "<p>texte</p>" }, // Result data if success is true
"error": "", // Error message if success is false
"external_ref": "1234-abcd-56ef-7891", // Optional reference ID provided in the original request
"external_metadata": {} // Optional metadata provided in the original request
}
FieldTypeDescription
process_cuidstringUnique identifier of the asynchronous process.
action_typestringThe type of action performed (e.g., transcription, summarization, ..).
successbooleanIndicates whether the action was completed successfully.
dataobjectContains the result of the action if success is true.
errorstringError message if success is false; empty otherwise.
external_refstring(Optional) Reference you provided with the initial request.
external_metadataobject(Optional) Metadata you provided with the initial request.

📌 Note: You can also access the results via the GET endpoint.

Step 3: Push New Actions

If needed, you can push new actions to your process without having to re-transcribe the audio files.

⚠️ You have to set a ttl (time to live) during which you can add new actions. After this ttl expires, the status of your entire process will be set to "finished".

Endpoint: POST /api/companies/{companyCuid}/orchestrators/{orchestratorCuid}/transcriptionRequests/{processCuid}/action

Example cURL:

curl -X POST https://api.dialonce.ai/agent-assist/v1/companies/{companyCuid}/orchestrators/{orchestratorCuid}/transcriptionRequests/{processCuid}/action \
-H "Authorization: Bearer YOUR_BEARER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"action_type": "factual_summary",
"webhook": "https://example.com/my/webhook",
"external_metadata": {
"conversation_id": "xyz-1234",
"source": "user-interface"
}
}'

Step 4: Cancel Process

At any point during the process, you can cancel all actions not started yet. This will also set the process to the status canceled, and no action can be added afterwards. Results of actions already completed will still be available through the corresponding GET endpoint.

Endpoint: POST /companies/{companyCuid}/orchestrators/{orchestratorCuid}/transcriptionRequests/{processCuid}/cancel

Example cURL:

curl -X POST https://api.dialonce.ai/agent-assist/v1/companies/{companyCuid}/orchestrators/{orchestratorCuid}/transcriptionRequests/{processCuid}/cancel \
-H "Authorization: Bearer YOUR_BEARER_TOKEN"

See Also