Feature Proposal: Add FunASR Audio Transcription for Searchable Audio Archives

ArchiveBox archives web content including audio/video files. Adding speech-to-text for archived audio content would make it searchable alongside text content. FunASR (17.8K+ stars, https://github.com/modelscope/FunASR) provides:

- **SenseVoice**: Ultra-fast multilingual ASR (50x faster than Whisper-large)
- **Paraformer**: Production-grade ASR with timestamps and punctuation
- **OpenAI-compatible API**: POST /v1/audio/transcriptions

Use case: When archiving pages with embedded audio (podcasts, interviews, meeting recordings), FunASR can transcribe the audio content and add it to ArchiveBoxes searchable index. This makes audio content as discoverable as text content.

Since ArchiveBox is self-hosted and FunASR also runs locally, they integrate naturally without external API dependencies.

Would adding FunASR transcription for archived audio be useful?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Proposal: Add FunASR Audio Transcription for Searchable Audio Archives #36

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature Proposal: Add FunASR Audio Transcription for Searchable Audio Archives #36

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions