Your private AI and notebook.
Chat with your vault, PDFs, and the web — all on local models, all on your Mac.

One workspace, eight core workflows.
A chat-first home for everything in your vault.
Open Canto and you land in a chat composer that speaks fluent vault. Ask anything across your notes, PDFs, audio, video, and the web. Streaming answers come back with citations, scope controls, and a timeline rail so you always know what the model looked at.
- Welcome screen is now a chat — no blank-note tax
- Scope to all notes, a folder, a single note, or attached files
- Citations [1][2] resolve to the exact source line you asked about
- Works the same on a 2.1 GB model as on a 76 GB flagship
Nine built-in models, plus room to bring your own.
A curated Unsloth GGUF lineup with Granite 4.1 3B as the default and GPT-OSS 120B at the top end. Choose by size, run via Metal, and connect your own through Ollama or LM Studio when you need to.
- Granite 4.1 3BDefault — tool use, RAG, and quick rewritesAny 8 GB Mac2.1 GBDefault — tool use, RAG, and quick rewritesAny 8 GB Mac
- Qwen 3.5 4BEveryday writing, coding, and tool calls8 GB Mac · smooth on 16 GB2.7 GBEveryday writing, coding, and tool calls8 GB Mac · smooth on 16 GB
- Qwen 3.5 9BSharper everyday reasoning and planning16 GB Mac · daily on 24 GB+5.7 GBSharper everyday reasoning and planning16 GB Mac · daily on 24 GB+
- GPT-OSS 20BTool-first reasoning and structured outputs24 GB Mac · smooth on 32 GB+11.6 GBTool-first reasoning and structured outputs24 GB Mac · smooth on 32 GB+
- Qwen 3.6 27BLong-form reasoning and careful editing32 GB Mac (tight on 24 GB)16.8 GBLong-form reasoning and careful editing32 GB Mac (tight on 24 GB)
- GLM-4.7 FlashMoE speed at near-dense quality32 GB Mac / 36 GB M3 Pro18.3 GBMoE speed at near-dense quality32 GB Mac / 36 GB M3 Pro
“Comfortable on” assumes the model is your main workload. If you're running heavy apps alongside (browser with many tabs, Xcode, Logic), pick one tier down — Canto warns you in-app when free RAM is too low.
A degraded but usable launch path for low-memory or post-crash boots. Canto skips the chat model, vision worker, voice transcription, and link search so the editor and your notes always come up. One click brings AI back online — and if your last model is too big, Canto suggests a smaller one you already have.
How Canto compares — and what stays on your Mac.
| Feature | Canto | Notion AI | NotebookLM | Obsidian |
|---|---|---|---|---|
| Chat-first AI home | ●Canto Chat: vault, web, files | ◐AI sidebar in pages | ●Chat with sources (cloud) | ◐Community plugins |
| Private, on-device AI | ●9 local models + external | ◐Cloud AI only | ◐Cloud Gemini only | ◐Plugins / BYO models |
| Seamless chat ↔ editor handoff | ●Citation → companion pane | ◐Inline AI, no chat handoff | ○Read-only sources | ◐Plugins |
| Agent edits your workspace | ●Notes + notebooks, diff review | ◐Cloud agent (Business+) | ○Read-only chat | ◐Community plugins |
| Encrypted multi-Mac sync | ●iCloud Drive + selective | ◐Cloud sync, offline pages | ◐Google account (cloud) | ●Paid E2E Sync |
| Notes + executable notebooks | ●Rich notes + Python/JS/TS | ◐Code blocks only | ○No code execution | ◐Plugins / scripts |
| Chat with PDFs, audio, video | ●Local, cited, multimodal | ◐Cloud AI on uploads | ●Cloud + audio overviews | ◐Plugins |
| Voice transcription | ●Native Whisper + Metal | ◐Cloud Meeting Notes | ◐Audio overviews (output) | ◐Plugins |
| Semantic links and graph | ●Memory Links + graph | ◐Databases / AI search | ○No graph or backlinks | ●Graph; AI via plugins |
| MCP / external AI access | ●Built-in MCP server | ◐Enterprise MCP | ○No MCP | ◐Community plugins |
| Offline AI workflows | ●After model download | ◐Offline pages, no AI | ○Cloud-only | ◐Plugins / local setup |
| Pricing | ·One-time $29.99 | ·$20 / user / mo | ·Free + Plus $19.99 / mo | ·Free; paid sync |
Free for note-taking. Pay once for unlimited AI.
Just download.
Download Freev0.8.2- Unlimited notes & nested folders
- WikiLinks + Backlinks
- Code Notebooks (Python, JS, TS)
- Privacy-first iCloud Drive Sync
- Selective folder sync per Mac
- Encrypted exports & backups
- Split panes & daily notes
- Mermaid diagrams + LaTeX math
- Knowledge graph
- Full-text search
- Safe Mode + Low Memory Mode for tight Macs
- MCP Server — connect Claude, Cursor, Windsurf & more
- Send Diagnostics — redacted, reviewable support bundles
- 25 free AI queries to try
1 device · Lifetime updates
- Everything in Free, plus
- Unlimited AI queries (no quotas)
- Canto Chat — chat-first home for your vault, web & attachments
- Seamless handoff — citations open the source note in a companion pane
- Agent Chat — multi-step tool calls with diff review
- Web research with deep-research approval gates
- Multimodal attachments — PDFs, audio, image, and video with citations
- Vision on local models (vision-capable Qwen + fallback)
- 9 built-in local models, from 2.1 GB to 76.5 GB
- External endpoints — Ollama, LM Studio, or any OpenAI-compatible API
- @ mentions for notes, cells, attachments, sessions & selections
- Vault Manager for bulk organization
- Memory Links + Related Notes + semantic search
vs the alternatives
Frequently asked questions
- Canto Chat is the new chat-first home introduced in v0.8.0. When you open Canto, you land in a chat composer that can answer across your entire vault, attached PDFs, audio, video, and the web — streaming back citations you can click. It pairs with Seamless Handoff: every citation [1][2] opens the source note in a companion pane, and you can promote any chat answer into a fresh draft note in one keystroke. Think of it as a private NotebookLM that runs on your Mac, with full read/write into your notes when you want it.
- Three surfaces, one workspace. Canto Chat is for asking questions and exploring — your chat-first home. Notes are for writing and thinking — essays, research, knowledge bases with inline AI assistance. Notebooks are for code — Python, JavaScript, and TypeScript with instant execution and inline output. They share state: chat answers can hand off into notes, notes can be cited back inside chat, and Memory Links connect everything by meaning.
- Canto runs a native Python environment locally on your Mac — no WebAssembly, no cloud. Popular packages like numpy, pandas, and matplotlib install automatically when you import them. Variables are shared across Python, JavaScript, and TypeScript cells in the same notebook.
- Memory Links is Canto’s semantic linking system that automatically finds related notes as you write. Unlike manual tagging or keyword search, it uses AI embeddings to understand meaning — surfacing relevant insights from thousands of notes instantly, including code examples from your notebooks.
- Yes. After the first model download, Canto works completely offline. On planes, trains, or anywhere without internet — unlimited AI assistance, Memory Links, and semantic search are always available.
- Canto ships with 9 built-in local models, all Unsloth-hosted GGUF builds: Granite 4.1 3B (2.1 GB, default), Qwen 3.5 4B (2.7 GB), Qwen 3.5 9B (5.7 GB), GPT-OSS 20B (11.6 GB), Qwen 3.6 27B (16.8 GB), GLM-4.7 Flash (18.3 GB, 30B MoE), Qwen 3.6 35B A3B (22.1 GB), GPT-OSS 120B A5B (62.8 GB), and Qwen 3.5 122B A10B (76.5 GB). All support tool calling and run locally via Metal GPU acceleration. Larger multi-file models download in shards in parallel and resume cleanly if you cancel midway. You can also connect Ollama or LM Studio as external endpoints to use any additional model you’ve downloaded — including cloud-hosted providers like OpenAI and Anthropic through Ollama.
- Absolutely. Your notes, code, and Memory Links embeddings are stored locally in an AES-256 encrypted SQLite database. When you use iCloud Drive Sync, Canto writes encrypted sync data to your own iCloud Drive folder using a Sync Passphrase you choose. Canto does not upload readable note content to a Canto server.
- macOS 14 (Sonoma) or later. Requires Apple Silicon (M1 or later) for Metal GPU inference. 8 GB RAM is the minimum — enough for the default Granite 4.1 3B (2.1 GB) and Qwen 3.5 4B (2.7 GB). Enable Low Memory Mode in Settings to attempt larger models on smaller machines (may reduce context or cause instability). Recommended: 16 GB+ for Qwen 3.5 9B, 24–32 GB+ for GPT-OSS 20B and Qwen 3.6 27B, 32–48 GB for GLM-4.7 Flash and Qwen 3.6 35B A3B, and 96 GB+ for the GPT-OSS 120B and Qwen 3.5 122B flagship MoE builds. Canto also cold-starts local models on demand, so the app stays light at launch even with a large model selected. Models download once and cache locally.
- Yes. Canto ships a built-in MCP Server that lets Claude Desktop, Cursor, Claude Code, Windsurf, OpenClaw, and any other Model Context Protocol client read and write your vault locally. Toggle it on in Settings → MCP Server. The server only accepts connections from your own Mac, an optional bearer token adds a second lock, and tools are split into always-on vault tools plus a separate opt-in automation dispatcher for scripted workflows. Per-tool approvals from your client still apply, so you stay in the loop.
- Yes. Canto includes a Safe Mode launch path designed for tight machines. If free memory is low at boot, or if the previous launch crashed before the window finished loading, Canto starts in a stripped-down mode where your notes, editor, search, and UI still work normally. AI extras are deferred until you say go, and Canto can suggest a smaller model if your last one was too large.
- The Memory dashboard and status bar aggregate Canto’s related processes, including heavier helpers like vision workers. The Memory modal can also break the total down by process so you can see which subsystem is using the most RAM. Canto also includes a “Reclaim now” button in Settings → Memory that drops idle weights and frees RAM on demand without restarting the app.
- Open Help → Send Diagnostics (also available from Settings → About). Canto assembles a redacted JSON bundle covering memory, AI runtime, updater state, database health, recent crash metadata, and (optionally) a screenshot and recent log excerpts. You see the full payload — with secrets and personal content scrubbed — before anything leaves your Mac, and can either save it locally for your own records or upload it directly to the LonelyDuck diagnostics endpoint. Renderer and helper-process crashes are captured locally so the next report can explain what actually happened.
- Canto is a one-time $29.99 purchase — not a subscription. The license activates on 1 Mac and includes lifetime updates within the current major version. Canto v0.8.0 replaces what people typically pay $120–$240 per year for across Notion AI, NotebookLM Plus, Reflect, or ChatGPT Plus. To view activated devices, deactivate an old Mac, or transfer your license, create a LonelyDuck account using the same email from your purchase receipt and open your Account Dashboard.
Start free today.
Keep the note-taking app for free. Unlock the heavy AI workflows only when you actually want them.
Free forever for note-taking · 25 AI queries included · Unlimited AI for $29.99 one-time





