AI Video Editor Alternatives (Longform): TimeBolt vs Descript vs Gling vs Loom

ai editing auto edit video automatic jump cut descript editing issues gling Sep 17, 2025
TimeBolt vs Descript vs Gling

=

Last Update: October 13, 2025

 

Accuracy is everything in long-form video


Podcasts, webinars, and YouTube uploads often run 30, 60, or even 120 minutes. At that scale small misses in silence or filler removal quickly compound into 10–15 minutes of content your audience doesn’t want to sit through, and you have to fix by hand.

In PART ONE of AI Video Editor Showdown we tested short-form accuracy (93 seconds) and showed why even a few seconds of filler can ruin watchability. In Part TWO we test a 60-minute Zoom recording and add Gling to the mix.  In Part THREE we test CapCut vs Premiere Pro vs TimeBolt (Short and long form tests). 

The verified results and files are presented below so others can reproduce the test.

 


For Context, Here are the Players:

  • Descript has raised around $100 million from investors that include OpenAI Startup Fund, Andreessen Horowitz, Redpoint, and Spark Capital. 

  • Gling is a newer AI editor built for long-form creators, focusing on podcasts and YouTube videos with automated silence and filler removal.

  • Loom Acquired by Atlassian for $975 million in 2023.  

  • TimeBolt is rapid video communications software, bootstrapped since 2019, no outside funding.


 

Summary of Findings

Each long-form test used the same 60-minute recording. Every file was processed with identical remove silence / filler automations.

Tool Silence Found
(min:sec)
Filler Found
(count)
Total Waste Found
(min:sec)
Misses
(est.)
Review Time
(min)
Fix Time
(min)
Total Time to Fix
(review + repair)
TimeBolt 10:07 948 17:05 0 40.0 0.0 40.0 min 
Descript 7:26 623 13:56 693 47.9 52.0 99.9 min
Gling 9:37 688 13:54 654 46.3 174.4 220.7 min
Loom 2:45 92 2:45 57.8 Not Able Can't Fix
“Silence Found” = total silence automatically detected + cut.
“Filler Found” = detected filler words/phrases per JSON export.
“Total Waste Found” = sum of silence + filler removed automatically.
“Total Time to Fix” = review + repair. TimeBolt found and cut 100 % of waste on the first pass.
 
  • TimeBolt finished cleanly at 42 min 55 sec. Removed 17 min 05 sec of waste with no re-review or manual correction.

  • Descript left 448 filler words and 171 seconds of silence, adding roughly 52 minutes of manual repair for a single hour-long video.

  • Gling performed similarly, cutting more aggressively but still missing over 600 filler words and introducing false cuts that require inspection.

  • CapCut required the most correction time. 3 hours of manual cleanup to repair timeline gaps / missed phrases.

  • Loom produced the longest output (57:47). No repair due to its lack of edit controls.

With Turbo enabled by default, TimeBolt sets the real-world benchmark at 38:09. Compared to this, Descript and Gling add back 20–25% more runtime. The equivalent of 10–15 minutes of extra filler in every hour of content.

 


 

What AI Missed

An unscripted 60-minute Zoom recording was run through each editor (TimeBolt, Descript, Gling, and Loom) using their default silence and filler-removal automations. Each exported file was then analyzed in TimeBolt to find leftover filler and quantify what isn't removed. Reverse-timeline shows only what each tool failed to cut. By running the JSON timeline data through TimeBolt, we could precisely measure how much unnecessary content each tool left.

Comparison Formula % More Total Waste Captured by TimeBolt
vs Descript (17 / 11 − 1) × 100 +55 %
vs Gling  (17 / 13 − 1) × 100 +31 %
vs Loom (17 / 3 − 1) × 100 +467 %
Values represent relative editing efficiency combining silence and filler detection accuracy.
TimeBolt detected ≈ 43 % more total waste than Descript and Gling on average, and over 5× more than Loom.
  • Descript left 6 minutes 30 seconds of unnecessary content.

  • Gling (Bad Takes OFF) left 4 minutes 17 seconds of unnecessary content.

  • Loom left 15 minutes 30 seconds of unnecessary content.

  • TimeBolt had no waste file, because there was nothing left behind.

 

Descript Misses


Gling Misses

 

Loom Misses

 *Data Verification: Each editor’s output was converted to JSON and re-evaluated in TimeBolt’s waveform timeline to ensure all silence and filler detections were measured accurately. Each tool’s JSON export was re-imported into TimeBolt’s reverse timeline to verify leftover waste time.

 


 

Accuracy and Re-Edit Overhead

In long-form editing, missed filler is not just a few minutes. When you compound accuracy across hours it’s the difference between watchable and unwatchable. Every missed silence or filler word compounds over hours of footage. 

To measure post edit work we parsed a JSON output from each tool’s uncaught filler. Each miss represents a segment of silence longer than 0.3 seconds or a filler word that wasn’t cut. On average, each miss required 5–10 seconds of manual review and trimming.

This extra labor becomes real overhead:

Tool Review Speed Missed Filler
(count)
Missed Silence
(sec)
Total Fix
Actions
Review
Time
Est. Repair
Time
Total Edit + Cleanup Time
TimeBolt 1.5× 0 0 0 40.0 min 0.0 min 40.0 min ✅
Descript 448 170.9 693 47.9 min 52.0 min 99.9 min
Gling  607 69.6 654 46.3 min 174.4 min 220.7 min
Loom 57.8 min N/A No editing layer (57:47 output)
“Review Time” = time to validate the first pass. “Repair Time” = estimated manual cleanup for missed silences and filler words (5–10 sec per fix).
“Total Edit + Cleanup” = combined workflow duration to reach final, watchable video from one hour of footage.


In short: TimeBolt didn’t just finish faster. It finished clean.
Every silence and filler removed automatically meant zero wasted re-edits later. A result no AI-driven transcript editor matched in this test.

 


 

Methodology

Baseline Establishment (via Umcheck)
- Tool: TimeBolt Umcheck (v7.0.4)
- Settings: Silence detection at 0.5s, “Look for Repeats” enabled

This is Umcheck, TimeBolt's ala carte AI transcription service. The only software you can add any unique word tic or phrase. 

Process
1. Add file with Silence Detection Settings
2. Run Umcheck
3. Click “Look for Repeats”
4. Click “Turn Off Selected Words”
5. Export JSON and SRT

Baseline Results
- Dead air (≥0.5s): 10:07
- Filler words: 605
- Repeated words: 343
- Total flagged words: 948

Software Versions
- TimeBolt: v7.0.4
- Descript: latest release (Sept 16, 2025)
- Gling: latest release (Sept 16, 2025)

Downloads:
- Raw 60-minute Zoom video (59:58 total duration)
- Raw 60-minute Zoom SRT transcript


Validation & Ground Truth — TimeBolt Removal Summary

[Verified] Using the TimeBolt Umcheck JSON exported from this exact 60-minute recording, we computed totals directly from timestamps (end − start) for every removed token/phrase. Per this dataset, TimeBolt’s pass removed the following:

Metric Value
Segments removed (filler + repeats) 1,005
Total time removed 320.03 s (~5.33 min)
Immediate word-repeat events (adjacent) 97

Top Tokens by Count

Token Count
yeah 137
uh 101
um 76
I 63
you 63
so 55
know 54
and 36
ok 33
the 21

Top Tokens by Total Seconds Removed

Token Total Seconds
um 43.96
uh 42.29
yeah 39.07
so 17.94
I 15.84
and 14.61
ok 11.45
know 10.89
you 8.38
the 8.34

Common Adjacent Phrases (Bigrams)

[Verified] Frequent adjacent patterns removed in this pass include: “you know” (53), “yeah yeah” (27), “uh uh” (23), “uh yeah” (20), “um yeah” (16), “yeah so” (16), “i mean” (12).

Two-Way Controls (Replicability)

  • Negative control: Run the TimeBolt output through Descript and Gling. Expect 0 new filler/silence detections on this same clip.
  • Positive control: Run the Descript/Gling outputs back through TimeBolt. Measure additional filler/silence TimeBolt still finds.

Artifacts for Audit

  • Raw 60-minute video (MP4) — Download
  • Source SRT transcript (Amazon Transcribe) — Download
  • TimeBolt Umcheck JSON (this summary) — Download
  • TimeBolt Output (MP4) + SRT — Download
  • Descript Output (MP4/SRT) — Download
  • Gling Output (MP4/SRT) — Download

Note: Filler list, matching rules (whole-word vs subword), minimum silence length (≥0.5s), padding, and version numbers are documented above so anyone can reproduce the counts.

  


 

TimeBolt Results

Settings
- Remove silence longer than 0.5s
- Ignore detections shorter than 0.75s
- Left padding 0.01s, right padding 0.15s

Performance
- Dead air removed: 10:07 (100%)
- Filler/repeats removed: 948 (100%)
- Final duration: 42:43
- Waste file: none

Bonus: With TurboMode (1.125x), final duration = 38:09

(With TurboMode increase your rate of speech and speak more words per minute without sounding like a chipmunk.)

Downloads:

Download TimeBolt Output 

Download TimeBolt Output with Turbo

Download TimeBolt Output SRT 

 


 

Descript Results

Settings
- Remove all filler words
- 'Avoid Harsh Cuts' turned off
- Remove gaps > 0.5s, shorten to 0.5s

Silence Detection

Filler Word Detection

Performance
- Dead air removed: 7:26 (of 10:07 baseline, ~73%)
- Filler/repeats removed: 623 (of 948 baseline, ~66%)
- Final duration: 47:54
- Waste file: 6:30

Downloads:

Descript Output Video

Descript Output SRT

Descript Waste

Descript Waste SRT


 

Gling Results

Settings
- Silence detection at 0.5s

Gling Silence Detection

Dead air + filler only (Bad Takes disabled)
- Final duration: 46:18
- Waste file: 4:17

Interpretation
'Bad Takes' removal cut actual content, not just filler. For unscripted video, this risks losing meaningful material. Both with and without 'Bad Takes' turned on, Gling left 4+ minutes of filler and silence.

Downloads:

Gling Output / SRT

Gling Waste Only Output / SRT


 

Loom Results

Settings
- No settings possible. Toggle on: Remove Silence / Remove Filler Words

Loom Silence Detection and Filler Removal 

Dead air + filler
- Final duration: 57:47
- Waste file: 15:30

Downloads:

Loom Output / SRT

Loom Waste Only Output/SRT

 

Reproducibility

All files used in this study are available for download. Anyone can repeat the test and verify the results.

 


 

Conclusion

  • TimeBolt’s waveform engine = 0 missed cuts = 0 cleanup.

  • Descript, Gling, CapCut depend on transcripts → they miss low volume speech & soft pauses.

  • Loom has no editing layer → cannot be fixed at all.

  • The difference between AI editing and actual automation is measured in hours of repair time.

For creators editing long recordings (podcasts, webinars, lectures, YouTube videos) those minutes matter. 

Disclaimer: The results of this study are based on tests conducted and verified as of September 18, 2025. Software performance may change with future updates.

Update — October 2025:
Loom’s results were re-tested and verified using JSON data parsed through TimeBolt’s reverse-timeline analysis. The JSON confirmation ensures every missed silence and filler segment is accounted for, aligning this comparison with the same verification process used for Descript and Gling.