diff --git a/fern/assistants/pronunciation-dictionaries.mdx b/fern/assistants/pronunciation-dictionaries.mdx
index e1f830d22..7168c7c55 100644
--- a/fern/assistants/pronunciation-dictionaries.mdx
+++ b/fern/assistants/pronunciation-dictionaries.mdx
@@ -6,13 +6,21 @@ slug: assistants/pronunciation-dictionaries
## Overview
-Pronunciation dictionaries allow you to customize how your AI assistant pronounces specific words, names, acronyms, or technical terms. This feature is particularly useful for ensuring consistent pronunciation of brand names, proper nouns, or industry-specific terminology that might be mispronounced by default.
+Pronunciation dictionaries allow you to customize how your AI assistant pronounces specific words, names, acronyms, or technical terms. This is particularly useful for ensuring consistent pronunciation of brand names, proper nouns, or industry-specific terminology that might be mispronounced by default.
-Pronunciation dictionaries are supported by the following voice providers:
+## Provider support
-- **ElevenLabs** — phoneme rules (IPA and CMU Arpabet) and alias rules
-- **Cartesia** — "sounds-like" aliases and IPA notation (sonic-3 model only)
-- **Vapi built-in voices** — pronunciation dictionaries via a unified locator
+Pronunciation dictionaries are supported on Cartesia, ElevenLabs, and Vapi built-in voices. The rule shape, required model, and field on `voice` differ by provider:
+
+| Provider | Rule shape | Required model | Field on `voice` | Cardinality |
+|---|---|---|---|---|
+| **Cartesia** | `items` with `{ text, alias }` | `sonic-3` (or any date-pinned variant such as `sonic-3-2026-01-12`) | `pronunciationDictId` (single upstream `pdict_*` ID) | One dictionary per voice |
+| **ElevenLabs** | `rules` with `alias` and `phoneme` entries | Phoneme rules require compatible ElevenLabs models such as `eleven_turbo_v2`, `eleven_turbo_v2_5`, or `eleven_flash_v2` | `pronunciationDictionaryLocators` (array of `{ pronunciationDictionaryId, versionId }`) | Multiple dictionaries per voice |
+| **Vapi built-in voices** | References Cartesia or ElevenLabs dictionaries | No provider-specific voice model requirement | `pronunciationDictionary` (array of locators) | Multiple dictionaries per voice |
+
+
+Cartesia pronunciation dictionaries require the `sonic-3` model — older Cartesia models (`sonic-2`, `sonic-english`, etc.) will reject `pronunciationDictId` on assistant create/update with a validation error.
+
## How Pronunciation Dictionaries Work
@@ -51,16 +59,30 @@ Corrected pronunciations:
## Prerequisites
-- A Vapi assistant configured with an **ElevenLabs**, **Cartesia**, or **Vapi** voice
-- For ElevenLabs: understanding of phonetic notation (IPA or CMU Arpabet) for phoneme-based rules
-- For Cartesia: the `sonic-3` voice model (pronunciation dictionaries are only available on sonic-3)
+- A Vapi assistant configured with a Cartesia, ElevenLabs, or Vapi built-in voice
+- For Cartesia: the voice must use the `sonic-3` model
+- For ElevenLabs: phoneme rules require `eleven_turbo_v2`, `eleven_flash_v2`, or another compatible model
+- Understanding of phonetic notation (IPA or CMU Arpabet) for phoneme-based rules
- Access to Vapi's API for dictionary creation
## Types of Pronunciation Rules
-### ElevenLabs Rules
+Cartesia and ElevenLabs use different rule shapes. Vapi built-in voices do not define their own rules; they reference dictionaries created through Cartesia or ElevenLabs.
+
+### Cartesia: alias items
-#### Phoneme Rules
+Cartesia dictionaries use a single `items` array. Each entry replaces a word or phrase with a pronunciation hint. Cartesia supports plain-English sounds-like aliases and IPA notation wrapped in angle brackets:
+
+```json
+{
+ "text": "Vapi",
+ "alias": "vay-pee"
+}
+```
+
+Phonemes are not separately typed on Cartesia — write the desired pronunciation directly in the `alias` field.
+
+### ElevenLabs: phoneme rules
Phoneme rules specify exact pronunciation using phonetic alphabets. These provide the most precise control over pronunciation.
@@ -73,278 +95,326 @@ Phoneme rules only work with specific ElevenLabs models:
- `eleven_turbo_v2`
- `eleven_flash_v2`
-#### Alias Rules
+### ElevenLabs: alias rules
Alias rules replace words with alternative spellings or phrases. These work with all ElevenLabs models and are useful for:
- Converting acronyms to full phrases (e.g., "UN" → "United Nations")
- Providing phonetic spellings for difficult words
- Standardizing pronunciation across different contexts
-### Cartesia Rules
+## Implementation
-Cartesia pronunciation dictionaries use a `text` and `alias` format. Each entry maps a word to its pronunciation. Cartesia supports two alias styles:
+
+
+
+
+ Use Vapi's API to create a Cartesia pronunciation dictionary.
-- **Sounds-like guidance**: A plain-English hint for how to say the word (e.g., `"VAH-pee"`)
-- **IPA notation**: Precise phonetic spelling wrapped in angle brackets (e.g., `"<<ˈ|v|ɑ|ˈ|p|i>>"`)
+ ```bash
+ POST https://api.vapi.ai/provider/cartesia/pronunciation-dictionary
+ Content-Type: application/json
+ Authorization: Bearer YOUR_API_KEY
+ ```
-
- Cartesia pronunciation dictionaries are only available with the `sonic-3` model.
-
+ ```json
+ {
+ "name": "My Cartesia Dictionary",
+ "items": [
+ { "text": "Vapi", "alias": "vay-pee" },
+ { "text": "VCS", "alias": "vee see ess" },
+ { "text": "API", "alias": "ay pee eye" }
+ ]
+ }
+ ```
-## Implementation
+ The API responds with a Vapi-wrapped envelope containing both a Vapi-side UUID and the upstream Cartesia resource ID:
-### ElevenLabs
+ ```json
+ {
+ "id": "d0ccf95c-2bd5-410a-8236-3432da032198",
+ "orgId": "YOUR_ORG_ID",
+ "provider": "cartesia",
+ "resourceName": "pronunciation-dictionary",
+ "resourceId": "pdict_xuvPYBguZ4cpdiakWM3dPN",
+ "resource": {
+ "id": "pdict_xuvPYBguZ4cpdiakWM3dPN",
+ "name": "My Cartesia Dictionary",
+ "items": [{ "text": "Vapi", "alias": "vay-pee", "pronunciation": "vay-pee" }]
+ }
+ }
+ ```
-
-
- Use Vapi's API to create a pronunciation dictionary with your custom rules.
+
+ Cartesia returns **two IDs**. When attaching the dictionary to your assistant, use the **upstream `resourceId`** (e.g. `pdict_xuvPYBguZ4cpdiakWM3dPN`), NOT the Vapi UUID. Using the Vapi UUID is silent — the call won't fail, but the pronunciation rules won't apply.
+
+
- ```bash
- POST https://api.vapi.ai/provider/11labs/pronunciation-dictionary
- Content-Type: application/json
- Authorization: Bearer YOUR_API_KEY
- ```
+
+ Update your assistant configuration to reference the Cartesia upstream `resourceId` via `voice.pronunciationDictId`. The voice model must be `sonic-3`.
- ```json
- {
- "name": "My Custom Dictionary",
- "rules": [
+ ```json
{
- "stringToReplace": "tomato",
- "type": "phoneme",
- "phoneme": "/tə'meɪtoʊ/",
- "alphabet": "ipa"
- },
+ "voice": {
+ "provider": "cartesia",
+ "model": "sonic-3",
+ "voiceId": "a0e99841-438c-4a64-b679-ae501e7d6091",
+ "pronunciationDictId": "pdict_xuvPYBguZ4cpdiakWM3dPN"
+ }
+ }
+ ```
+
+ Date-pinned `sonic-3-*` variants (such as `sonic-3-2026-01-12`) also accept the field. Older Cartesia models reject it.
+
+
+
+ Create a test call or use the Vapi playground to verify that your custom pronunciations are working correctly.
+
+
+
+
+
+
+
+ Use Vapi's API to create an ElevenLabs pronunciation dictionary with your custom rules.
+
+ ```bash
+ POST https://api.vapi.ai/provider/11labs/pronunciation-dictionary
+ Content-Type: application/json
+ Authorization: Bearer YOUR_API_KEY
+ ```
+
+ ```json
{
- "stringToReplace": "Vapi",
- "type": "phoneme",
- "phoneme": "V AE P IY",
- "alphabet": "cmu-arpabet"
- },
+ "name": "My Custom Dictionary",
+ "rules": [
+ {
+ "stringToReplace": "tomato",
+ "type": "phoneme",
+ "phoneme": "/tə'meɪtoʊ/",
+ "alphabet": "ipa"
+ },
+ {
+ "stringToReplace": "Vapi",
+ "type": "phoneme",
+ "phoneme": "V AE P IY",
+ "alphabet": "cmu-arpabet"
+ },
+ {
+ "stringToReplace": "UN",
+ "type": "alias",
+ "alias": "United Nations"
+ }
+ ]
+ }
+ ```
+
+ The API responds with a Vapi-wrapped envelope. Both the upstream `pronunciationDictionaryId` and `versionId` (used to attach the dictionary) are inside the `resource` field:
+
+ ```json
{
- "stringToReplace": "UN",
- "type": "alias",
- "alias": "United Nations"
+ "id": "YOUR_VAPI_UUID",
+ "orgId": "YOUR_ORG_ID",
+ "provider": "11labs",
+ "resourceName": "pronunciation-dictionary",
+ "resourceId": "rjshI10OgN6KxqtJBqO4",
+ "resource": {
+ "pronunciationDictionaryId": "rjshI10OgN6KxqtJBqO4",
+ "versionId": "xJl0ImZzi3cYp61T0UQG",
+ "name": "My Custom Dictionary"
+ }
}
- ]
- }
- ```
+ ```
- The API will respond with:
- ```json
- {
- "pronunciationDictionaryId": "rjshI10OgN6KxqtJBqO4",
- "versionId": "xJl0ImZzi3cYp61T0UQG",
- "name": "My Custom Dictionary",
- "rules": [...],
- "createdAt": "2024-01-15T10:30:00Z"
- }
- ```
-
+
+ As with Cartesia, attach the dictionary using the upstream IDs (`pronunciationDictionaryId` and `versionId`), NOT the Vapi UUID.
+
+
-
- Update your assistant configuration to use the pronunciation dictionary.
+
+ Update your assistant configuration to use the pronunciation dictionary. ElevenLabs supports multiple dictionaries per voice via `pronunciationDictionaryLocators`.
- ```json
- {
- "voice": {
- "model": "eleven_turbo_v2_5",
- "voiceId": "sarah",
- "provider": "11labs",
- "stability": 0.5,
- "similarityBoost": 0.75,
- "pronunciationDictionaryLocators": [
- {
- "pronunciationDictionaryId": "rjshI10OgN6KxqtJBqO4",
- "versionId": "xJl0ImZzi3cYp61T0UQG"
+ ```json
+ {
+ "voice": {
+ "model": "eleven_turbo_v2_5",
+ "voiceId": "sarah",
+ "provider": "11labs",
+ "stability": 0.5,
+ "similarityBoost": 0.75,
+ "pronunciationDictionaryLocators": [
+ {
+ "pronunciationDictionaryId": "rjshI10OgN6KxqtJBqO4",
+ "versionId": "xJl0ImZzi3cYp61T0UQG"
+ }
+ ]
}
- ]
- }
- }
- ```
+ }
+ ```
-
- When a pronunciation dictionary is added, SSML parsing will be automatically enabled for your assistant.
-
-
+
+ When a pronunciation dictionary is added, SSML parsing will be automatically enabled for your assistant.
+
+
-
- Create a test call or use the Vapi playground to verify that your custom pronunciations are working correctly.
-
-
+
+ Create a test call or use the Vapi playground to verify that your custom pronunciations are working correctly.
+
+
+
-### Cartesia
+
+
+
+ Create a pronunciation dictionary using the Cartesia or ElevenLabs API endpoints shown above.
-
-
- Use Vapi's API to create a Cartesia pronunciation dictionary.
+ Vapi built-in voices reference those provider dictionaries through `voice.pronunciationDictionary`; they do not have a separate dictionary creation endpoint.
+
- ```bash
- POST https://api.vapi.ai/provider/cartesia/pronunciation-dictionary
- Content-Type: application/json
- Authorization: Bearer YOUR_API_KEY
- ```
+
+ Add the pronunciation dictionary locator to your Vapi voice configuration.
- ```json
- {
- "name": "My Cartesia Dictionary",
- "items": [
+ ```json
{
- "text": "Vapi",
- "alias": "VAH-pee"
- },
- {
- "text": "Nginx",
- "alias": "Engine-X"
- },
+ "voice": {
+ "voiceId": "Elliot",
+ "provider": "vapi",
+ "pronunciationDictionary": [
+ {
+ "pronunciationDictId": "pdict_xuvPYBguZ4cpdiakWM3dPN"
+ }
+ ]
+ }
+ }
+ ```
+
+ For an ElevenLabs-backed dictionary, set `pronunciationDictId` to the upstream `pronunciationDictionaryId` and include the `versionId`:
+
+ ```json
{
- "text": "GIF",
- "alias": "<<ˈ|dʒ|ɪ|f>>"
+ "voice": {
+ "voiceId": "Elliot",
+ "provider": "vapi",
+ "pronunciationDictionary": [
+ {
+ "pronunciationDictId": "rjshI10OgN6KxqtJBqO4",
+ "versionId": "xJl0ImZzi3cYp61T0UQG"
+ }
+ ]
+ }
}
- ]
- }
- ```
+ ```
- The API will respond with a dictionary object containing an `id` you'll use in the next step.
-
+
+ Use upstream provider IDs in `pronunciationDictionary`, not the Vapi UUID returned as `id`. For Cartesia, use the `pdict_*` `resourceId`. For ElevenLabs, use the `pronunciationDictionaryId` and `versionId` from `resource`.
+
+
-
- Add the pronunciation dictionary ID to your Cartesia voice configuration.
+
+ Create a test call or use the Vapi playground to verify that your custom pronunciations are working correctly.
+
+
+
+
- ```json
- {
- "voice": {
- "model": "sonic-3",
- "voiceId": "your-cartesia-voice-id",
- "provider": "cartesia",
- "pronunciationDictId": "dict_abc123"
- }
- }
- ```
-
+## Bring Your Own Key (BYOK)
-
- Create a test call or use the Vapi playground to verify that your custom pronunciations are working correctly.
-
-
+Vapi-managed pronunciation dictionaries (created via the Vapi API as shown above) use Vapi's platform credentials with the upstream provider. If your organization has its own Cartesia or ElevenLabs credentials configured (BYOK), the lifecycle changes:
-### Vapi Built-in Voices
+
+
+ Organizations with Cartesia BYOK credentials must create, edit, and delete pronunciation dictionaries directly through Cartesia's API. Vapi's `POST/PATCH/DELETE /provider/cartesia/pronunciation-dictionary` endpoints will reject requests from BYOK orgs with the following error:
-
-
- Create a pronunciation dictionary using either the ElevenLabs or Cartesia API endpoints shown above. The dictionary ID from either provider can be used with Vapi built-in voices.
-
+ ```text
+ Found credentials for cartesia. Use cartesia's API with your own credentials to manage 'pronunciation-dictionary' resources.
+ ```
+
+ Once you have the dictionary ID from Cartesia (a `pdict_*` string), attach it to your Vapi assistant the same way as Vapi-managed dictionaries — set `voice.pronunciationDictId` to that ID. The dictionary itself lives on Cartesia's side; Vapi just references it.
+
+
+
+ Organizations with ElevenLabs BYOK credentials must create, edit, and delete pronunciation dictionaries directly through ElevenLabs's API or dashboard. Vapi's create/update/delete endpoints will reject BYOK requests with the same shape of error.
-
- Add the pronunciation dictionary locator to your Vapi voice configuration.
+ Once you have the `pronunciationDictionaryId` and `versionId` from ElevenLabs, attach them to your Vapi assistant via `voice.pronunciationDictionaryLocators`:
```json
{
"voice": {
- "voiceId": "Elliot",
- "provider": "vapi",
- "pronunciationDictionary": [
+ "model": "eleven_turbo_v2_5",
+ "voiceId": "your-voice-id",
+ "provider": "11labs",
+ "pronunciationDictionaryLocators": [
{
- "pronunciationDictId": "pdict_abc123"
+ "pronunciationDictionaryId": "your-elevenlabs-dict-id",
+ "versionId": "your-elevenlabs-version-id"
}
]
}
}
```
+
+
-
- The `versionId` field is optional for Vapi voices. It is only required when referencing an ElevenLabs-backed dictionary.
-
-
-
-
- Create a test call or use the Vapi playground to verify that your custom pronunciations are working correctly.
-
-
-
-## Using Your Own ElevenLabs Account (BYOK)
-
-If you're using your own ElevenLabs API key (Bring Your Own Key), you can create pronunciation dictionaries directly in your ElevenLabs account and reference them in Vapi:
-
-1. Create a pronunciation dictionary in your ElevenLabs account
-2. Note the `pronunciationDictionaryId` and `versionId` from ElevenLabs
-3. Use these IDs in your Vapi assistant configuration:
-
-```json
-{
- "voice": {
- "model": "eleven_turbo_v2_5",
- "voiceId": "your-voice-id",
- "provider": "11labs",
- "pronunciationDictionaryLocators": [
- {
- "pronunciationDictionaryId": "your-elevenlabs-dict-id",
- "versionId": "your-elevenlabs-version-id"
- }
- ]
- }
-}
-```
+For Vapi built-in voices, create the dictionary directly with the upstream provider when BYOK applies, then reference the upstream IDs in `voice.pronunciationDictionary`.
## Managing Pronunciation Dictionaries
-### ElevenLabs
+The management endpoints work for provider-managed dictionaries — replace `{provider}` with `cartesia` or `11labs`. Vapi built-in voices reference those dictionaries and do not have separate management endpoints.
-#### List Your Dictionaries
+### List Your Dictionaries
```bash
-GET https://api.vapi.ai/provider/11labs/pronunciation-dictionary
+GET https://api.vapi.ai/provider/{provider}/pronunciation-dictionary
Authorization: Bearer YOUR_API_KEY
```
-#### Update Dictionary Rules
+### Update a Dictionary
```bash
-PATCH https://api.vapi.ai/provider/11labs/pronunciation-dictionary/{dictionaryId}
+PATCH https://api.vapi.ai/provider/{provider}/pronunciation-dictionary/{dictionaryId}
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY
```
-```json
-{
- "rules": [
+Use the request body shape for the provider you are updating:
+
+
+
+ ```json
{
- "stringToReplace": "tomato",
- "type": "phoneme",
- "phoneme": "/tə'mɑːtoʊ/",
- "alphabet": "ipa"
+ "items": [
+ {
+ "text": "Vapi",
+ "alias": "VAH-pee"
+ }
+ ]
}
- ]
-}
-```
-
-### Cartesia
-
-#### List Your Dictionaries
+ ```
+
-```bash
-GET https://api.vapi.ai/provider/cartesia/pronunciation-dictionary
-Authorization: Bearer YOUR_API_KEY
-```
+
+ ```json
+ {
+ "rules": [
+ {
+ "stringToReplace": "tomato",
+ "type": "phoneme",
+ "phoneme": "/tə'mɑːtoʊ/",
+ "alphabet": "ipa"
+ }
+ ]
+ }
+ ```
+
+
-#### Update Dictionary Items
+### Delete a Dictionary
```bash
-PATCH https://api.vapi.ai/provider/cartesia/pronunciation-dictionary/{dictionaryId}
-Content-Type: application/json
+DELETE https://api.vapi.ai/provider/{provider}/pronunciation-dictionary/{dictionaryId}
Authorization: Bearer YOUR_API_KEY
```
-```json
-{
- "items": [
- {
- "text": "Vapi",
- "alias": "VAH-pee"
- }
- ]
-}
-```
+The `{dictionaryId}` here is the Vapi UUID (returned as `id` in the create response), not the upstream provider ID.
## Best Practices
@@ -353,18 +423,24 @@ Authorization: Bearer YOUR_API_KEY
- **Order Matters**: Rules are applied in the order they appear in the dictionary. The first matching rule is used.
- **Testing**: Always test pronunciation changes with your specific voice and model combination.
- **Phoneme Accuracy**: Ensure proper stress marking for multi-syllable words when using phoneme rules.
-- **Model Compatibility**: ElevenLabs phoneme rules only work with `eleven_turbo_v2` and `eleven_flash_v2`. Cartesia pronunciation dictionaries require the `sonic-3` model.
+- **Model Compatibility**: Cartesia dictionaries require `sonic-3`; ElevenLabs phoneme rules require specific models.
+- **Use upstream provider IDs**: When attaching a dictionary to a voice, use the Cartesia `pdict_*` ID or the ElevenLabs `pronunciationDictionaryId` and `versionId`, not the Vapi UUID. Use the Vapi UUID only with Vapi's list, update, and delete endpoints.
## Common Issues
**Pronunciation Not Applied**
-- Verify you're using a compatible model (ElevenLabs phoneme rules need specific models; Cartesia needs `sonic-3`)
-- Check that the word to replace exactly matches the text in your content (case-sensitive)
-- Ensure the pronunciation dictionary is properly referenced in your voice configuration
+- Verify you're using a compatible model: `sonic-3` for Cartesia, or a compatible ElevenLabs model for phoneme rules
+- Check that the `text` (Cartesia) or `stringToReplace` (ElevenLabs) exactly matches the text in your content (case-sensitive)
+- Ensure you used the **upstream provider IDs** (not the Vapi UUID) when attaching the dictionary to your voice — see the warning callouts above
+- For Vapi built-in voices with an ElevenLabs-backed dictionary, include both `pronunciationDictId` and `versionId`
+- Confirm the pronunciation dictionary is properly referenced in your voice configuration
+
+**Provider create/update/delete returns a 4xx**
+- If your organization has Cartesia or ElevenLabs BYOK credentials configured, the management endpoints will reject your request. See the BYOK section above for the alternate flow
-**SSML Conflicts**
-- When pronunciation dictionaries are enabled, SSML parsing is automatically activated
+**SSML Conflicts (ElevenLabs only)**
+- When ElevenLabs pronunciation dictionaries are enabled, SSML parsing is automatically activated
- Ensure any existing SSML tags in your content are properly formatted
**Performance Impact**