⚡️ Ideas

Enhanced Voice Memo Workflow—Merging Voice and Agents

Explain the problem as you see it

Enhancing the Voice Capture Experience

The Problem

The current implementation of voice capture, especially on mobile, is a bit too focused—relevant information, such as text-based fields and other parts of the graph, are hidden by the UI. Relatedly, right now voice capture is, by design, non-interactive: you speak your thoughts into the device and it transcribes and performs its AI magic. This is good for many use cases, but I think we can do more.

Why is this a problem for you?

I’m a pretty frequent user of Voice capture, and I make use of autofill fields and all the current affordances of Tana. However, I often want to interact with fields or parts of my graph that are not available to me while I’m capturing a voice memo and my phone is in my pocket, such as fields that contain reflection prompts, or other nodes that contain brainstorming ideas. I don’t want to have to practically memorize this information—too high a cognitive load—and there are times when pulling out my phone to check information is impracticable or undesirable.

Suggest a solution

Voice capture—agentified

Since the team has expressed their investment in voice as a primary modality in Tana, why not take it even further? We have the voice capture workflow, and we now have some support for custom agents—why keep these separate?

Imagine you pick up your phone on your walk to do some brainstorming, similar to the use cases already shown in Tana’s marketing materials. You initiate something like what AI companies have all taken to calling “Live” mode (Chatgpt Live, Gemini Live, etc.), except it has knowledge of (at least some) of your graph (perhaps what you select to share with it). Instead of merely speaking into Tana, you speak with Tana. Early iterations would be fairly limited but even with limitations could be extremely powerful. Say that voice capture had access to the fields you’ve selected for the capture supertag—for the sake of example, say you have a #brainstorming supertag with fields like “ideas,” “challenges,” “moonshots,” etc.— and as you’re walking, you could ask it to prompt you using your fields. More advanced iterations could feature more conversational features. At its most ambitious, what I’m talking about here is like the above-mentioned “Live” modes that are currently available, but with access to and knowledge of your graph. That in itself is obviously a key differentiator from other AI services. But what I think is valuable about framing this as a combination between voice capture and agents would be the ability to record your thoughts, just like you do with capture, but to do so in dialogue with an AI agent. Gemini Live and Chatgpt Live, at least at the moment, don’t seem to have anything quite like this. These conversations seem meant to be ephemeral and, while “conversational,” are about providing the user with information rather than helping them develop and generate ideas.

Crazy? Impossible? Could be cool!