Skip to main content
Context understanding combines structured activity tracking, browser heartbeats, accessibility-based app capture, OCR, and search. Together, these layers create a local-first record of what was happening on screen, what app was in focus, and how that context can be queried later.

Overview

There are two main capture paths:
PathRoleBest at
WatcherStructured desktop contextApp usage, browser activity, titles, accessibility text, session boundaries
RecorderVisible-content captureOCR text, thumbnail history, search evidence when native app context is weak
Those feeds are indexed locally for text, semantic, and hybrid retrieval. When cloud memory is enabled, local chunks can also be uploaded to the memory pipeline for cloud-backed querying.

Context sources

Several signals contribute to the final context model.
SourceWhat it contributes
Browser extension heartbeatsActive tab context, browser domain, tab/url state, extension health
macOS accessibility snapshotsFocused UI state, selected text, document hints, visible app context
Window/title observationMain window changes, title changes, focused UI changes
Activity trackingFocused app, timing, AFK, lock/unlock, sleep/wake
OCR captureScreen text that is visible but not exposed cleanly through native app APIs
For rich desktop context, the primary path is browser + accessibility capture. OCR is used to extend or recover context when native capture is thin, degraded, or unavailable.

Watcher

The watcher is the structured activity layer. It handles app/browser usage, context snapshots, and session boundaries.

What it captures

SignalStored data
Active appBundle id, app name, process, and timing for the focused app
Window titleOff, truncated, hashed, or full title depending on privacy mode
Browser contextDomain by default, or full URL when enabled
Browser tab changesHeartbeat-driven browser updates
AFK stateIdle detection so inactive time does not count as active usage
Session boundariesLock/unlock, sleep/wake, app switches, active segments
Daily rollupsAggregated app usage for analytics and habit sync

Accessibility capture

On macOS, the watcher reads accessibility trees to capture richer app context than app-time tracking alone can provide. Depending on the app, this can include:
Accessibility-derived signalExample
Focused or selected textSelected code, editor text, highlighted content
Document identityFile path, document name, filename hints
Window and UI labelsCurrent panel, section, or tool state
Visible descendant textNearby visible text inside editors and app windows
The watcher also uses event-driven accessibility observers for focused UI changes, main-window changes, and title changes, so context updates are not limited to coarse polling.

Defaults and controls

SettingCurrent behavior
Poll intervalRoughly every 2 seconds by default
Title modeoff, truncate, hash, or full
URL modeDomain-only by default
IncognitoOff by default
AFK timeout15 minutes by default
ExclusionsPer-app exclusions for sensitive or irrelevant tools

Recorder

The recorder is the visible-content layer. It captures periodic screen frames, skips near-duplicates, runs Apple Vision OCR locally, and stores thumbnails plus extracted text in the local database.

Pipeline

StageWhat happens
CaptureRead the current screen plus active-window metadata
DedupSkip frames that are visually and textually too similar to the last stored frame
OCRExtract visible text on-device with Apple’s Vision framework
ThumbnailingSave a lightweight image preview instead of continuous video
StorageWrite OCR text, quality score, metadata, and thumbnail path to local libSQL
This layer is especially useful when accessibility output is sparse, noisy, or unavailable, and for building searchable evidence over time.

Search and memory

Local context is turned into searchable memory in two stages: local retrieval first, optional cloud memory second.

Local retrieval

ModeHow it worksBest for
Text searchFTS over OCR text and metadataExact terms, app names, localhost URLs, identifiers
Semantic searchVector similarity over embedded chunksNatural-language questions
Hybrid searchLexical + vector signals combinedBest default for most context questions
Activity fallbackWatcher activity when semantic evidence is weakTime-spent and app-usage questions
The local hybrid bridge lets query paths call into the local vector + FTS index without requiring raw context to leave the machine first.

Optional cloud memory

When cloud memory is enabled, local search_chunks can be uploaded to the memory pipeline. Those chunks are deduplicated, embedded, retained, and queried through the memory API.
RoutePurpose
POST /api/memory/ingest-chunksIngest local chunks from the upload outbox
POST /api/memory/queryQuery cloud-backed memory
GET /api/memory/healthInspect pipeline freshness and indexing health
Older watcher-memory routes still exist as temporary compatibility aliases, but the canonical path is now /api/memory/*.

What this enables

The combined model supports several different kinds of questions.
QuestionPrimary backing signal
”What app did I spend the most time in?”Activity rollups and segments
”Where did my computer time go today?”Watcher activity plus sync-to-habit rollups
”When was I working on the settings page?”Recorder OCR + local hybrid retrieval
”What was on screen when I was in Figma?”OCR evidence + app/window context
”Summarize what I was doing this afternoon”Hybrid retrieval plus activity context
”What did I work on recently across devices?”Cloud memory, when enabled

Privacy and control

These layers have separate controls and different storage characteristics.
ControlEffect
App exclusionsStop tracking for selected apps
Title modeDisable titles or reduce sensitivity with truncation or hashing
URL modeLimit browser capture to domains instead of full URLs
Incognito off by defaultIgnore incognito browser tabs unless explicitly enabled
Local-first storageRaw watcher data and recorder output are stored locally first
Optional cloud memoryCloud indexing is a separate pipeline, not a requirement for local context capture
The recorder stores deduplicated thumbnails plus OCR text, not continuous desktop video. The watcher stores structured activity metadata and native context snapshots rather than full screen imagery.

Health signals

The app surfaces several diagnostics so context quality is visible, not opaque.
SignalMeaning
context_enabledEnough recent high-fidelity context is being captured
context_qualityhigh, medium, degraded, or unavailable based on recent coverage
Browser heartbeat liveThe extension is reaching the watcher listener
Duplicate watcher detectionMultiple local listeners may be splitting context and degrading capture
FreshnessSearch and memory responses can report stale or degraded OCR/semantic state
If quality drops, the system can still fall back to weaker signals such as activity-only summaries, but screen-grounded answers will be less precise.