Free and open-source software — v0.1.8

Transcribe interviews without sending your audio to the cloud.

Transcritório is a desktop app for automatic transcription and speaker separation in Brazilian Portuguese. Runs 100% on your machine — no login, no subscription, no data upload.

See all releases on GitHub →

Still evaluating? Keep scrolling — 30 seconds to see why researchers are moving away from cloud services.

Transcritório main screen showing waveform, turn table with interviewer and respondent, and block editor.
  • 🔒 100% local, no cloud
  • 🗣️ Automatic speaker separation
  • 🇧🇷 Native Brazilian Portuguese
  • 〰️ Waveform-synced editor
  • 📄 Multi-format export
  • 🆓 Free and open-source

How your working hours change

Without Transcritório

  • 8–10 hours manually transcribing a 1-hour interview.
  • Uploading confidential audio to foreign company servers.
  • R$ 100–300/month for online transcription services.
  • Labeling speakers by hand, line by line.
  • Explaining to the ethics board why audio went to the cloud.

With Transcritório

  • 15–30 minutes of processing, then just review.
  • Audio never leaves your computer.
  • Zero cost, forever. Open-source under MIT license.
  • Speakers auto-identified and renamable in one click.
  • Ready-to-paste text for your research protocol (right below).

Privacy by design, not by promise

  • 100% local processing: interview audio never reaches external servers.
  • No data collection, no telemetry: no signup or login required.
  • Open-source under MIT license: auditable by anyone.
  • Compatible with GDPR/LGPD and IRB requirements: full control over informant's audio.

Everything a qualitative researcher needs

Project file manager showing multiple interviews organized by status.

Project file manager

All of your project's audio files in a single screen, with visual status for every stage: not yet transcribed, queued, done, reviewed. You keep track of an entire project — dozens of interviews — without losing the thread of your work.

Audio player synced with waveform, showing current position in the interview.

Side-by-side audio review

Listen and correct at the same time. Each transcribed segment is anchored to the exact audio timestamp, and one click takes you there. Researchers review in about a third of the time it would take in a plain text editor.

Turn editor showing options to merge and split transcript blocks.

Turn editor with waveform

The waveform visually shows silences, overlaps, and speaker changes. Adjust segment boundaries, merge or split blocks with a click — useful for fast-paced or frequently interrupted interviews.

Context menu on the turn table showing advanced editing options.

Analysis-ready turn table

Your transcript is organized into speaking turns with time and speaker metadata. Export to DOCX, MD, SRT, VTT, CSV, TSV or NVivo — and import straight into NVivo, Atlas.ti, MAXQDA, or an R/Python script.

How to use it, in four steps

  1. 1

    Create a project

    Pick a name for your project and a folder where files will be organized. Transcritório creates the folder structure for you.

    New project dialog. Project name selection screen.
  2. 2

    Add audio or video files

    Drag your files into the window. MP3, WAV, M4A, MP4 and other common formats are supported.

    Media file addition screen.
  3. 3

    Click transcribe

    Pick the language (Brazilian Portuguese) and number of speakers. Transcritório does the rest — on a typical laptop, it runs in about a third to half of the audio length.

  4. 4

    Review in the Studio and export

    Open the transcript, adjust segments, rename speakers (Interviewer, Joana, Pedro…) and export in your preferred format.

Installing on your system

🪟 Windows 10/11
  1. Download Transcritorio-0.1.8-Setup.exe from the latest release.
  2. Run the installer. Windows Defender may show a blue warning — click "More info", then "Run anyway".
  3. Open Transcritório from the Start menu.
  4. Optional: if you have an NVIDIA graphics card, the app auto-detects it and offers to download GPU acceleration (~1 GB), making transcription up to 9× faster.
🍎 macOS (Apple Silicon)
  1. Download Transcritorio.dmg from the latest release.
  2. Drag Transcritório to the Applications folder.
  3. First launch: right-click the app icon and choose "Open", then click "Open" in the security dialog. If the app still won't launch (macOS 15 Sequoia blocks apps without a paid Apple Developer ID), open Terminal and drag the Habilitar Transcritório.command file from the mounted DMG into the Terminal window, then press Enter. Full instructions in docs/MAC_INSTALL.md.
  4. Automatic Metal acceleration (M1/M2/M3/M4): the app ships with the Apple GPU acceleration library bundled. A Motor: MLX (Metal) badge appears in the project header when acceleration is active. On the first transcription, the optimized model (~1.6 GB) is downloaded in the background.
  5. Everything bundled: ffmpeg and all dependencies ship inside the .dmg. Nothing to install via terminal.
🐧 Linux
  1. Download Transcritorio-x86_64.AppImage from the latest release.
  2. In the terminal, grant execute permission: chmod +x Transcritorio-*.AppImage
  3. Install system X11 libs (Ubuntu/Debian): sudo apt install libfuse2 libxcb-cursor0 libxcb-xinerama0 libxkbcommon-x11-0. ffmpeg ships inside the AppImage — no separate install needed.
  4. Run by double-clicking or ./Transcritorio-x86_64.AppImage. Tested on Ubuntu 22.04+ and Fedora 40+.

System requirements

Minimum
  • CPU: 4 cores
  • RAM: 8 GB
  • Disk: 5 GB free
  • Works — 1h of audio takes ~40–60 min.
Ideal
  • CPU: 8+ cores
  • RAM: 16 GB or more
  • GPU: NVIDIA 6 GB+ VRAM (or Apple Silicon)
  • 1h of audio in 5–10 min.

The AI behind it

(technical details)

Whisper (OpenAI, 2022)

Speech recognition model trained on 680k hours of multilingual audio, including large amounts of Portuguese. Transcritório uses the large-v3 variant by default, delivering high accuracy even on noisy audio or regional accents. Runs locally via faster-whisper.

pyannote.audio (Bredin et al., 2020)

Library handling automatic speaker separation — identifying who spoke when. Uses neural networks to cluster similar voices across the interview. Works well even with 6–8 distinct participants, and also runs offline.

Local processing

Both Whisper and pyannote run entirely on your computer via PyTorch. No audio, text, or metadata ever leaves your machine. First launch downloads model weights (~3 GB); after that, the app runs without an internet connection.

Frequently asked questions

Do I need the internet to use it?

Only on first launch, to download models (~3 GB). After that, Transcritório works fully offline.

How accurate is the transcription?

On clean Brazilian Portuguese audio, accuracy ranges from 90% to 96% of words correct. Human review remains recommended, especially for technical terms, proper names and noisy passages.

What about interviews with strong regional accents?

Whisper large-v3 was trained with wide dialectal variation in Portuguese and handles Brazilian regional accents well. Accuracy drops are usually small (2–4 percentage points) compared to standard Sao Paulo/Rio speech.

My institution's IT blocks software installs. What do I do?

On Windows and Linux, Transcritório can run in portable mode from a user folder, without admin privileges (.AppImage on Linux; zipped build on Windows on request). As a last resort, request an IT exception citing the MIT license and public GitHub repo.

How do I cite Transcritório in a paper or thesis?

Barbosa, R. J. (2026). Transcritório: transcrição local de entrevistas em português brasileiro (v0.1.8) [Software]. IESP-UERJ/CERES.

@software{barbosa2026transcritorio,
  author    = {Barbosa, Rog{\'e}rio Jer{\^o}nimo},
  title     = {Transcrit{\'o}rio: transcri{\c{c}}{\~a}o local de entrevistas em portugu{\^e}s brasileiro},
  year      = {2026},
  version   = {0.1.8},
  publisher = {IESP-UERJ/CERES},
  license   = {MIT},
  url       = {https://github.com/antrologos/Transcritorio}
}
Do I need a Hugging Face token for the models?

Not in the standard flow. Transcritório ships with the required model components. A Hugging Face token is only required in advanced scenarios (e.g., manually using newer speaker-diarization models).

Does it work with interviews in other languages?

Yes. Whisper supports 90+ languages. Transcritório focuses on Brazilian Portuguese, but Spanish, English, French and others can be selected.

Can I transcribe focus groups with many participants?

Yes, up to about 8 speakers with good results. Beyond that, speaker separation starts mixing similar voices — manual label review in the editor is recommended.

Is the code really open? Can I audit it?

Yes. All source code is published on GitHub under MIT license. Anyone can read, modify, redistribute and verify the app's behavior.

Start today. Your audio stays with you.

Free, no signup, open source.

See all releases on GitHub