vintage video store with extensive vhs collection

Fewer API Calls, Faster Syncs: Performance Fixes in Kometa and PlexTraktSync

   

Written by:

vintage video store with extensive vhs collection

If you’re running a self-hosted media server, you know the setup: Kometa keeps your Plex libraries organized with collections and overlays, and PlexTraktSync keeps your watch history synced between Plex and Trakt. Both tools spend a lot of time talking to external APIs — TMDb, Plex, Trakt — and on large libraries, every unnecessary call adds up. Slow runs, rate limit errors, incomplete syncs.

This batch of four PRs is all about trimming that fat: fixing cache bypasses that had no business being there, swapping O(n) list lookups for O(1) set lookups, stopping config from getting clobbered on re-login, and plugging the spots where rate limiters and caching were simply missing. Small changes — real impact.


Kometa

PR #3116 — Stop Bypassing the Cache (and Stop Reloading Everything)

The project: Kometa is a Python-based metadata manager for Plex. It auto-builds collections using data from TMDb, Trakt, IMDb, and others — and it applies overlays to poster art (those little “4K”, “HDR”, or rating badges you see). It runs on a schedule, touching every item in your library.

The issue: Two unrelated problems were causing a flood of redundant API calls. First, check_filters() and check_missing_filters() in builder.py called get_movie() and get_show() with ignore_cache=True — which bypassed the SQLite cache entirely and fired a live TMDb API call for every filtered item, even when the data was already cached. On a library with thousands of items, that’s a lot of unnecessary network traffic. Second, overlays.py was calling reload(item, force=True) unconditionally for every item, triggering redundant Plex API calls on every normal run where items were already loaded.

The fix: Removed ignore_cache=True from three call sites in builder.py, and changed the overlay reload from force=True to force=self.library.reapply_overlays — so the forced reload only fires when the user has explicitly asked for it via the reapply_overlays flag.

# builder.py — before
tmdb_item = self.config.TMDb.get_movie(item_id, ignore_cache=True)
# builder.py — after
tmdb_item = self.config.TMDb.get_movie(item_id)
# overlays.py — before
self.library.reload(item, force=True)
# overlays.py — after
self.library.reload(item, force=self.library.reapply_overlays)

What I learned: If a function exposes an ignore_cache parameter, someone put effort into building that cache. Before calling with ignore_cache=True, it’s worth stopping to ask: is this override actually necessary here, or did it just creep in from a context where it made sense? In hot code paths, cache bypasses compound fast.

PR #3117 — Set Lookups Instead of List Scans

The issue: In add_to_collection() and run_collections_again(), Kometa checked whether a Plex item was already in a collection using Python’s in operator — against a plain list of PlexAPI objects. That’s an O(n) scan per item. For a 2,000-item collection checking 10,000 found items, you’re looking at potentially 20 million comparisons per run.

The fix: Before the loop, build a set of ratingKey integers from the collection items once. Then check membership against the set — O(1) — instead of the list.

# Before — O(n) list scan per item in the found set
if item in collection_items:
# After — O(n) build once, O(1) per lookup
collection_item_keys = {ci.ratingKey for ci in collection_items}
if item.ratingKey in collection_item_keys:

What I learned: Any time you have a membership check inside a loop, take a second to think about the data structure. A list is for iteration; a set is for “is this in here?” The set comprehension pays its cost once; every iteration after that is free.


PlexTraktSync

an empty movie theater

PR #2472 — Don’t Wipe Server Config on Re-Login

The project: PlexTraktSync synchronizes your Plex watch history with your Trakt account. Server configuration lives in a servers.yml file and includes options like a libraries whitelist to limit which Plex libraries get synced.

The issue: Re-logging into a Plex server wiped the user’s custom libraries whitelist and other server config. The root cause was in PlexServerConfig.asdict() — it always emitted the config key even when config=None. When that dict was merged back into the stored config, the None value silently clobbered whatever was already there.

The fix: Strip None-valued optional fields (id and config) from the asdict() output before the merge. No value, no clobber. A regression test was added to tests/test_config.py to confirm that re-login preserves an existing libraries whitelist.

# PlexServerConfig.py — strip None fields before returning
for key in ("id", "config"):
if data[key] is None:
del data[key]

What I learned: When merging config dictionaries, a missing key and a None-valued key should behave differently. Missing means “leave whatever’s there”; None can silently mean “overwrite with nothing.” It’s worth being explicit about what an absent optional field actually means at the point of merge.

PR #2473 — Reducing Redundant Trakt API Calls

The issue: On large libraries, PlexTraktSync was hitting Trakt’s rate limits more often than it should have. After digging through the API call paths, a few separate problems turned up:

  • Non-cached show_collection property: Decorated as @property instead of @cached_property, so every access triggered a fresh API call to fetch the user’s Trakt show collection. In a sync loop, this can run hundreds of times.
  • No deduplication in get_plex_episodes(): The function called resolve_guid() once per episode with no caching by show. A series with 200 episodes across 3 shows triggered 200 resolve_guid() calls instead of 3.

The fix: Swap @property for @cached_property on show_collection, and add a show_cache dict keyed on grandparentGuid to deduplicate resolve_guid() calls:

# TraktApi.py — cache the property result
-@property
+@cached_property
def show_collection(self): ...
# Walker.py — deduplicate resolve_guid() calls per show
+show_cache: dict[str, Media | None] = {}
+if grandparent_guid not in show_cache:
+ show_cache[grandparent_guid] = self.mf.resolve_guid(guid)
+show = show_cache[grandparent_guid]

The PR also originally added @time_limit() to search_by_id() to match the throttling applied to other Trakt methods — but a reviewer correctly pointed out that this was the wrong approach. search_by_id() makes GET requests, which Trakt allows at up to 1,000 calls per 5 minutes. The @time_limit() decorator enforces 1 call per second — the limit for POST requests (300 calls per 5 minutes). Applying the POST throttle to a GET endpoint would cap it at less than a third of its actual allowed rate.

What I learned: “Apply the same pattern as the other methods” is a reasonable instinct, but it breaks down when those methods have different underlying constraints. Trakt’s rate limits are tiered by HTTP method — GET and POST have meaningfully different ceilings — so a blanket throttle that works for one is overcorrection for the other. It’s also a good reminder to ask the question the reviewer asked: did you actually measure improvement before and after? An assumption of benefit isn’t the same as a demonstrated one.


Wrap Up

All four of these changes share a common thread: code that was doing more work than necessary, usually because something that made sense in isolation compounded badly at scale. Dropping ignore_cache=True, swapping a list for a set, stripping None before a merge, caching a property — none of these are dramatic changes. But on a library with thousands of items running on a schedule, they add up to noticeably fewer API calls, fewer rate limit errors, and faster runs overall. And sometimes a reviewer catches that one of your proposed fixes would actually make things worse — which is exactly what code review is for.

Leave a Reply

Discover more from EnRoute

Subscribe now to keep reading and get access to the full archive.

Continue reading