Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 27 additions & 50 deletions .agents/skills/deepgram-java-audio-intelligence/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,10 @@ description: Use when writing or reviewing Java code in this repo that enables D

Audio intelligence is not a separate client in this SDK. It is the **Listen V1 REST request surface** with additional analysis fields enabled.

## When to use this product

- You have **audio** and want transcript + analysis together.
- REST is the main path; the Java WebSocket client only exposes the real-time subset.

**Use a different skill when:**
- You want plain transcription only → `deepgram-java-speech-to-text`.
- You already have text and only need text analysis → `deepgram-java-text-intelligence`.
- You need turn-aware conversational streaming → `deepgram-java-conversational-stt`.
- Plain transcription only → `deepgram-java-speech-to-text`.
- Text (not audio) analysis → `deepgram-java-text-intelligence`.
- Turn-aware conversational streaming → `deepgram-java-conversational-stt`.

## Authentication

Expand Down Expand Up @@ -46,24 +41,22 @@ ListenV1RequestUrl request = ListenV1RequestUrl.builder()
MediaTranscribeResponse result = client.listen().v1().media().transcribeUrl(request);
```

The concrete repo example (`examples/listen/AdvancedOptions.java`) demonstrates the same pattern for enabling higher-value Listen options via the builder.

## What else the REST request surface supports
The concrete repo example (`examples/listen/AdvancedOptions.java`) demonstrates the same pattern for enabling higher-value Listen options via the builder. Always check the response for the intelligence fields you requested:

The generated `ListenV1RequestUrl` and `MediaTranscribeRequestOctetStream` classes also expose these verified analysis fields in this checkout:

- `sentiment`
- `summarize`
- `topics`
- `customTopic`
- `customTopicMode`
- `intents`
- `customIntent`
- `customIntentMode`
- `detectEntities`
- `detectLanguage`
- `diarize`
- `redact`
```java
result.visit(new MediaTranscribeResponse.Visitor<Void>() {
@Override
public Void visit(ListenV1Response response) {
response.getResults().getSentiments().ifPresent(s -> System.out.println("Sentiment: " + s));
return null;
}
@Override
public Void visit(com.deepgram.types.ListenV1AcceptedResponse accepted) {
System.out.println("Async accepted: " + accepted.getRequestId());
return null;
}
});
```

## Quick start — WebSocket subset

Expand Down Expand Up @@ -94,29 +87,19 @@ In this Java checkout, the WebSocket connect options include `diarize`, `detectE

## API reference (layered)

1. **In-repo source of truth**: `src/main/java/com/deepgram/resources/listen/v1/media/requests/` and `src/main/java/com/deepgram/resources/listen/v1/websocket/` plus `examples/listen/AdvancedOptions.java`. `reference.md` is absent here.
1. **In-repo source of truth**: `src/main/java/com/deepgram/resources/listen/v1/media/requests/` and `src/main/java/com/deepgram/resources/listen/v1/websocket/` plus `examples/listen/AdvancedOptions.java`.
2. **Canonical OpenAPI (REST)**: https://developers.deepgram.com/openapi.yaml
3. **Canonical AsyncAPI (WSS subset)**: https://developers.deepgram.com/asyncapi.yaml
4. **Context7**: `/llmstxt/developers_deepgram_llms_txt`
5. **Product docs**:
- https://developers.deepgram.com/docs/stt-intelligence-feature-overview
- https://developers.deepgram.com/docs/summarization
- https://developers.deepgram.com/docs/topic-detection
- https://developers.deepgram.com/docs/intent-recognition
- https://developers.deepgram.com/docs/sentiment-analysis
- https://developers.deepgram.com/docs/language-detection
- https://developers.deepgram.com/docs/redaction
- https://developers.deepgram.com/docs/diarization
4. **Product docs**: https://developers.deepgram.com/docs/stt-intelligence-feature-overview (links to individual feature docs for summarization, topics, intents, sentiment, language detection, redaction, diarization).

## Gotchas

1. **There is no separate “audio intelligence client”.** Everything hangs off Listen V1.
2. **Most intelligence fields are REST-only in this SDK surface.** The WebSocket connect options do not expose `summarize`, `topics`, `intents`, or `detectLanguage`.
3. **`summarize` on Listen V1 is its own generated type.** Do not assume the Read API shape is identical.
4. **The repo example only demonstrates diarization-level options.** There is no dedicated example file for sentiment/topics/intents in this checkout.
5. **`redact` is currently a single `String` field on the REST builders.** Do not assume Python-style string-or-list support here.
6. **Model support matters.** The examples consistently use `NOVA3`; follow that unless you have verified another model supports the overlays you need.
7. **These fields live on both URL and byte-upload request builders.** Pick the builder that matches your input source.
1. **No separate “audio intelligence client”.** Everything hangs off Listen V1 request builders.
2. **Most intelligence fields are REST-only.** WebSocket connect options do not expose `summarize`, `topics`, `intents`, or `detectLanguage`.
3. **`summarize` on Listen V1 has its own generated type.** Do not assume the Read API shape is identical.
4. **`redact` is a single `String` field** on the REST builders -- not a list like the Python SDK.
5. **Use `NOVA3` model** unless you have verified another model supports the overlays you need.
6. **Both URL and byte-upload builders expose intelligence fields.** Pick the builder that matches your input source.

## Example files in this repo

Expand All @@ -126,10 +109,4 @@ In this Java checkout, the WebSocket connect options include `diarize`, `detectE

## Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

```bash
npx skills add deepgram/skills
```

This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`).
For cross-language Deepgram product knowledge, install `npx skills add deepgram/skills`.
71 changes: 36 additions & 35 deletions .agents/skills/deepgram-java-management-api/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,9 @@ description: Use when writing or reviewing Java code in this repo that calls Dee

Administrative REST APIs for project metadata, project-scoped resources, and model discovery.

## When to use this product

- List or inspect projects.
- Manage project keys, members, invites, usage, or billing.
- Discover public or project-scoped STT/TTS models.

**Use a different skill when:**
- You want to run a live agent session → `deepgram-java-voice-agent`.
- You want speech/text inference rather than project administration → use the product skills for STT, TTS, or Read.
- Live agent session → `deepgram-java-voice-agent`.
- Speech/text inference → use the STT, TTS, or Read product skills.

## Authentication

Expand Down Expand Up @@ -49,8 +43,6 @@ for (ListProjectsV1ResponseProjectsItem project : projects) {

## Quick start — project models / keys

Pick a project from the list above. New accounts may have zero projects — guard against that before indexing.

```java
if (projects.isEmpty()) {
throw new IllegalStateException("No Deepgram projects are visible to this API key.");
Expand All @@ -63,21 +55,38 @@ client.manage().v1().projects().members().list(projectId);
client.manage().v1().projects().members().invites().list(projectId);
client.manage().v1().projects().usage().get(projectId);
client.manage().v1().projects().billing().balances().list(projectId);

```

## Key parameters / API surface
## Destructive operations — validate-then-act

- Top-level public models: `client.manage().v1().models().list()` and `.get(modelId)`
- Projects: `projects().list()`, `get(projectId)`, `update(projectId, ...)`, `delete(projectId)`, `leave(projectId)`
- Keys: `projects().keys().list/create/get/delete`
- Members: `projects().members().list/delete`
- Invites: `projects().members().invites().list/create/delete`
- Project models: `projects().models().list(projectId)`
- Usage: `projects().usage().get(projectId)`
- Billing: `projects().billing().balances().list(projectId)`
- Requests: `projects().requests()` subtree exists in the generated API surface
- Agent think-model discovery: `client.agent().v1().settings().think().models().list()`
- Most clients expose `withRawResponse()` alongside typed methods
```java
// 1. Verify the key exists before deleting
try {
var key = client.manage().v1().projects().keys().get(projectId, keyId);
// 2. Confirm identity before proceeding
System.out.printf("Deleting key: %s%n", key.getApiKeyId());
client.manage().v1().projects().keys().delete(projectId, keyId);
} catch (Exception e) {
System.err.println("Key not found or delete failed: " + e.getMessage());
}
```

## API surface (all under `client.manage().v1()`)

| Resource | Methods |
|----------|---------|
| `models()` | `list()`, `get(modelId)` |
| `projects()` | `list()`, `get`, `update`, `delete`, `leave` |
| `projects().keys()` | `list`, `create`, `get`, `delete` |
| `projects().members()` | `list`, `delete` |
| `projects().members().invites()` | `list`, `create`, `delete` |
| `projects().models()` | `list(projectId)` |
| `projects().usage()` | `get(projectId)` |
| `projects().billing().balances()` | `list(projectId)` |
| `projects().requests()` | subtree in generated surface |

Also: `client.agent().v1().settings().think().models().list()` for think-model discovery. Most clients expose `withRawResponse()` variants.

## API reference (layered)

Expand All @@ -92,12 +101,10 @@ client.manage().v1().projects().billing().balances().list(projectId);

## Gotchas

1. **Use an API key, not a temporary JWT, for Manage APIs.** The token-grant endpoint explicitly says those JWTs do not work here.
2. **Some example files are intentionally excluded from Gradle `compileExamples`.** `manage/ListModels.java`, `manage/MemberPermissions.java`, and `manage/UsageBreakdown.java` are currently excluded in `build.gradle`.
3. **Many manage examples are read-only by default.** Create/delete snippets are commented out to avoid destructive calls.
4. **Project-scoped model discovery and global model discovery are different.** `models().list()` returns public models; `projects().models().list(projectId)` returns what a project can use.
5. **This checkout does not expose the Python-style persisted voice-agent configuration client.** Do not promise `voice_agent.configurations.*` here.
6. **The SDK is highly nested.** For invites, the path is `projects().members().invites()`, not a top-level `invites()` client.
1. **Some example files are excluded from Gradle `compileExamples`** (`ListModels.java`, `MemberPermissions.java`, `UsageBreakdown.java`).
2. **Global vs project-scoped model discovery differ.** `models().list()` returns public models; `projects().models().list(projectId)` returns what a project can use.
3. **No Python-style persisted voice-agent configuration client** in this checkout. Do not promise `voice_agent.configurations.*`.
4. **The SDK is highly nested.** For invites: `projects().members().invites()`, not a top-level `invites()` client.

## Example files in this repo

Expand All @@ -112,10 +119,4 @@ client.manage().v1().projects().billing().balances().list(projectId);

## Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

```bash
npx skills add deepgram/skills
```

This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`).
For cross-language Deepgram product knowledge, install `npx skills add deepgram/skills`.
47 changes: 14 additions & 33 deletions .agents/skills/deepgram-java-speech-to-text/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,10 @@ description: Use when writing or reviewing Java code in this repo that calls Dee

Basic transcription for prerecorded audio over REST or live audio over WebSocket via `/v1/listen`.

## When to use this product

- **REST (`media().transcribeUrl` / `transcribeFile`)** — one-shot transcription of a complete URL or byte array.
- **WebSocket (`v1WebSocket()`)** — live streaming transcription with interim/final results.

**Use a different skill when:**
- You want summaries, sentiment, topics, intents, diarization, redaction, or language detection overlays on the same endpoint → `deepgram-java-audio-intelligence`.
- You need turn-aware conversational streaming on `/v2/listen` → `deepgram-java-conversational-stt`.
- You need a full interactive assistant with TTS + LLM orchestration → `deepgram-java-voice-agent`.
- Summaries, sentiment, topics, diarization, or redaction overlays → `deepgram-java-audio-intelligence`.
- Turn-aware conversational streaming (`/v2/listen`) → `deepgram-java-conversational-stt`.
- Full interactive assistant with TTS + LLM → `deepgram-java-voice-agent`.

## Authentication

Expand Down Expand Up @@ -62,25 +57,12 @@ MediaTranscribeResponse result = client.listen().v1().media().transcribeUrl(requ
result.visit(new MediaTranscribeResponse.Visitor<Void>() {
@Override
public Void visit(ListenV1Response response) {
// Guard channels + alternatives against empty results (matches examples/listen/TranscribeUrl.java).
String transcript = "";
java.util.List<?> channels = response.getResults().getChannels();
if (channels != null && !channels.isEmpty()) {
java.util.List<?> alternatives = response.getResults()
.getChannels().get(0)
.getAlternatives().orElse(java.util.Collections.emptyList());
if (!alternatives.isEmpty()) {
transcript = response.getResults()
.getChannels().get(0)
.getAlternatives().orElse(java.util.Collections.emptyList())
.get(0)
.getTranscript().orElse("");
}
}
String transcript = response.getResults().getChannels().get(0)
.getAlternatives().orElse(java.util.Collections.emptyList())
.get(0).getTranscript().orElse("");
System.out.println(transcript);
return null;
}

@Override
public Void visit(com.deepgram.types.ListenV1AcceptedResponse accepted) {
System.out.println("Request accepted: " + accepted.getRequestId());
Expand Down Expand Up @@ -129,9 +111,14 @@ wsClient.onResults(result -> {
System.out.printf("%s %s%n", isFinal ? "[final]" : "[interim]", transcript);
}
});
wsClient.onError(err -> System.err.println("WebSocket error: " + err.getMessage()));

wsClient.connect(V1ConnectOptions.builder().model(ListenV1Model.NOVA3).build())
.get(10, TimeUnit.SECONDS);
try {
wsClient.connect(V1ConnectOptions.builder().model(ListenV1Model.NOVA3).build())
.get(10, TimeUnit.SECONDS);
} catch (Exception e) {
throw new RuntimeException("Failed to connect STT WebSocket", e);
}

// send raw audio chunks here
// wsClient.sendMedia(okio.ByteString.of(audioChunk));
Expand Down Expand Up @@ -199,10 +186,4 @@ The async REST clients return `CompletableFuture<T>`. WebSocket clients are alre

## Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

```bash
npx skills add deepgram/skills
```

This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`).
For cross-language Deepgram product knowledge, install `npx skills add deepgram/skills`.
23 changes: 6 additions & 17 deletions .agents/skills/deepgram-java-text-to-speech/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,8 @@ description: Use when writing or reviewing Java code in this repo that calls Dee

Convert text to audio with REST or stream audio back incrementally over WebSocket via `/v1/speak`.

## When to use this product

- **REST (`audio().generate`)** — one-shot synthesis when you already have the full text.
- **WebSocket (`v1WebSocket()`)** — lower-latency synthesis while text arrives in chunks.

**Use a different skill when:**
- You need the system to listen, think, and speak in one session → `deepgram-java-voice-agent`.
- Full interactive assistant (listen + think + speak) → `deepgram-java-voice-agent`.

## Authentication

Expand Down Expand Up @@ -41,8 +36,9 @@ SpeakV1Request request = SpeakV1Request.builder()
.build();

InputStream audioStream = client.speak().v1().audio().generate(request);
Files.copy(audioStream, Path.of("output.mp3"), StandardCopyOption.REPLACE_EXISTING);
long bytes = Files.copy(audioStream, Path.of("output.mp3"), StandardCopyOption.REPLACE_EXISTING);
audioStream.close();
if (bytes == 0) throw new RuntimeException("TTS returned empty audio");
```

REST returns an `InputStream`, not JSON.
Expand Down Expand Up @@ -132,9 +128,8 @@ CompletableFuture<InputStream> future = asyncClient.speak().v1().audio().generat
2. **Flush before close on WebSocket.** The example sends `Flush` before `Close` so the tail of the audio is not lost.
3. **Streaming audio arrives as binary `ByteString`.** Convert to bytes before writing or playback.
4. **WebSocket options are narrower than REST.** `container` and `bitRate` are REST request fields, not WebSocket connect options in this checkout.
5. **TTS defaults are minimal unless you set them.** The example only sets `text`; pick an explicit model/encoding when output format matters.
6. **There is no Java `TextBuilder` helper in this repo.** That Python helper does not exist here.
7. **Async REST is `CompletableFuture<InputStream>`.** You still need to close the stream after the future resolves.
5. **TTS defaults are minimal.** Pick an explicit model/encoding when output format matters.
6. **Async REST is `CompletableFuture<InputStream>`.** You still need to close the stream after the future resolves.

## Example files in this repo

Expand All @@ -144,10 +139,4 @@ CompletableFuture<InputStream> future = asyncClient.speak().v1().audio().generat

## Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

```bash
npx skills add deepgram/skills
```

This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`).
For cross-language Deepgram product knowledge, install `npx skills add deepgram/skills`.
Loading