Skip to content

Features Overview

This page describes what the Huh platform offers end users. Security and data handling are called out throughout; see also Security Features for a full technical overview.

Privacy and on-premise processing

Transcription, translation, analytics (via Ollama), and storage run on infrastructure you control. Audio, video, and transcripts are not sent to third-party cloud APIs for the core pipeline. Details per feature below and in Security Features.


Documentation in the app

From the home screen, use Documentation to open the user and operator guides (this MkDocs site). The link is configured at deploy time (DOCS_URL in the frontend container environment, typically set in your production compose stack). If unset, the app defaults to a same-origin path /docs/.


Browser recording sessions

Record audio-only or video with audio directly in the browser, without installing software.

  • Progressive upload: While you record, audio/video is split into chunks and uploaded to your server so a browser crash only risks losing the last short segment.
  • Pause and resume: You can pause and continue within the same session.
  • Recovery after reload: If you reload the page, the app can offer to resume the same session or transcribe as-is from what was already uploaded. The UI keeps audio vs video mode aligned with the session so new segments match the existing recording format.
  • Transcription options: After you finish, you choose language, model, and optional deletion date like with an uploaded file.

Security

  • The browser only sends media to your Huh backend over the transport your site uses (use HTTPS in production).
  • Chunks are stored server-side until the session is finalized or discarded; follow your organization’s retention and access policies.

Speaker diarization

When supported by your deployment, transcriptions can include speaker labels (who spoke when), powered by WhisperX and optional pyannote diarization models running on your workers.

  • Single-speaker and multi-speaker bounds can be set when starting transcription (within limits your instance allows).
  • “Auto” lets the pipeline infer an appropriate speaker count.

Security

  • Diarization runs on your machines together with the transcription worker, not on a vendor SaaS.
  • Downloading some diarization models may require a Hugging Face token at image build time; that token is used only to fetch weights into your image or cache, not to send customer audio to Hugging Face at runtime. Your operator documents how your images were built.

Analytics

Open a completed transcription in the editor, then enable the Analytics panel. Analytics combines:

  1. Client-side modules (no LLM): metrics computed in your browser from the transcript JSON.
  2. Server-side LLM modules: your analytics worker sends the transcript to Ollama on your network and writes structured results back to Huh.

Session Overview (client-side)

  • Total duration, word count, and speaker count derived from segments.

Speaker Statistics (client-side)

  • Per-speaker word counts, talk time, and turn counts.

Session Summary (LLM, Ollama)

Goal: Concise clinical-style summary of the session.

Prompt template (transcript is inserted at the end; line breaks preserved):

You are a clinical documentation assistant. Summarize the following therapy session transcript concisely.

Include:
- Main topics discussed
- Key therapeutic interventions used
- Patient's emotional state and engagement level
- Any homework or action items assigned
- Overall session assessment

Transcript:
{formatted transcript with [time] Speaker: text lines}

Provide a concise clinical summary (200-400 words).

Therapy Techniques (LLM, Ollama)

Goal: List techniques (e.g. CBT, MI, DBT) with segment references.

Prompt template:

You are a clinical psychology expert. Analyze the following therapy session transcript and identify specific therapy techniques used.

For each technique found, provide:
- The technique name and category (e.g., CBT, Motivational Interviewing, DBT, Psychodynamic, etc.)
- A brief description of how it was used
- The segment number(s) where it appears

Return your analysis as a JSON array with this format:
[
  {
    "technique": "Open-ended Questions",
    "category": "Motivational Interviewing",
    "description": "Therapist uses open questions to explore the patient's perspective",
    "segments": [3, 7, 15],
    "speaker": "SPEAKER_00"
  }
]

Transcript:
{numbered lines: [1] 0.0s Speaker: text ...}

Return ONLY the JSON array, no other text.

DBT Validation Levels (LLM, Ollama)

Goal: Find instances of Linehan’s six validation levels in the dialogue.

Levels (embedded in the prompt):

Level Name Idea
1 Being Present Mindful attention to the other person
2 Accurate Reflection Reflecting what was said without judgment
3 Mind Reading / Articulating the Unverbalized Naming unstated thoughts or feelings
4 Understanding in Terms of History Validating in light of past context
5 Normalizing / Validating in Terms of Current Context Response makes sense in the situation
6 Radical Genuineness Authentic, equal, human connection

Prompt template (levels expanded in full in the worker; transcript appended as numbered segments):

You are a DBT (Dialectical Behavior Therapy) expert. Analyze the following therapy session and identify instances of the 6 DBT validation levels.

The 6 DBT Validation Levels:
Level 1: Being Present - Paying attention, being mindful, and staying present with the other person.
Level 2: Accurate Reflection - Summarizing or reflecting back what the person has said without judgment.
... (levels 3–6 as in application)

For each instance found, provide:
- The validation level (1-6)
- A brief explanation of how validation was demonstrated
- The segment number(s) where it appears
- Which speaker demonstrated the validation

Return your analysis as a JSON array:
[
  {
    "level": 2,
    "explanation": "...",
    "segments": [5, 6],
    "speaker": "SPEAKER_00"
  }
]

Transcript:
{numbered transcript}

Return ONLY the JSON array, no other text.

Security (all analytics)

  • Ollama must be reachable only from your trusted network; the worker sends transcript text to your Ollama instance, not to public cloud LLM APIs.
  • Users who may open Analytics still need normal transcription access (ACL); results are stored with the transcription in your database.
  • LLM output can be wrong or biased; use it as decision support, not as sole clinical or legal evidence.

Supervision

In the editor, open the Supervision panel to discuss a recording with colleagues:

  • Add comments tied to a timestamp on the waveform / timeline.
  • Reply in threads, edit your own comments, and delete where permitted.
  • Comments are visible to users who already have read access to the transcription (same access rules as viewing the session).

Security

  • Supervision data is stored in your MongoDB with the rest of Huh; it is not shared with external services.
  • Only people who can read the transcription can read supervision comments for it.

Core platform features (summary)

Transcription

  • Upload & process audio/video files.
  • Real-time updates while jobs run.
  • Editor with synchronized video and text.
  • Export subtitles (e.g. VTT) in ZIP archives.
  • Translation via LibreTranslate on your infrastructure.
  • Automatic deletion with optional email warnings.

User management & ACL

  • OIDC / Keycloak login, roles, invitations, and optional registration approval.
  • ACL on transcriptions: read, write, create, delete, admin permissions for users and roles.

Video access

  • Short-lived tokens and HTTP-only cookies for streaming; no anonymous direct download URLs.

Instance settings & notifications

  • Admins configure registration, emails, and templates; users receive approval, invitation, password reset, and deletion notices as configured.

Technical stack (operator summary)

REST APIs, WebSockets, MongoDB, MinIO, RabbitMQ, local Whisper-based transcription, local LibreTranslate, optional local Ollama for analytics — all under your control. See Developer setup and Architecture for details.