I Kept a Diary for Seven Years. An LLM Finally Read It.
I have been maintaining an online diary from 2019. Earlier I was using Evernote to document my life, my targets, and my frustrations. In 2023, I switched to Github. I exported all my notes to markdown and used git commits to add new entries in my life.
I have been using LLMs on those markdown files since 2023. Privacy for my personal notes has been a major constraint. I could never really use ChatGPT. So, I had to experiment a lot.
I started with building a RAG application over my notes. To ensure privacy, I used GPU instances on AWS. I asked LLM questions like “What are some negative patterns about this person? “, “How has this person evolved over time? “. But the analysis was never on point. Word embeddings of “negative patterns” or “evolved” never really picked the right context from my notes.
Next evolution was reasoning models, I tried Deepseek-R1. I concatenated my most important notes and passed them to Deepseek-R1 on my GPU instance on AWS as a single context. At that time, AI started being useful for self-analysis. I always read the reasoning of the model because it contained better signals than the actual answer. Deepseek-R1 helped me plan the next phase of my life.
I also tried finetuning LLMs on my diary. I wanted to see if my data makes LLM more empathetic or improves their reasoning. Turns out it does not. Privacy was still a restraining factor for me. To create training data, I was using Ollama on my Mac Mini and using distilled Llama 3.1 models to create instruction-based training data. Since I was asking Llama to come up with a question whose answer would come from my diary entry, the training data ended up with bias from Llama rather than insights from my notes.
I still hadn’t gotten the answer I was looking for, “How have I changed over the years?”
I would need a powerful LLM for that and a harness that could orchestrate the flow analyzing 200+ markdown notes spread across years. Claude Code was an obvious candidate for the harness. But it didn’t really work with smaller models like qwen and gemma. The solution was to use Anthropic API and not Claude.ai subscription. Anthropic API gives 30 days guarantee to delete your data if they are not flagged against usage policy(a pretty good loophole for Anthropic to retain chats without telling the user).
So, I had a choice. Trust the general goodwill of people at Anthropic or wait another 2-3 years for small language models to be efficient to get my answers.
I decided to go ahead with Claude Code with Anthropic API. And, it was a good decision. First off, I wouldn’t be writing this post if not for the answers I got from Claude. To give you a glimpse, this is what Claude told me:
- You’ve been working on the same problem for 7 years without realizing it. Your interest in RAG, citation graphs, knowledge retrieval, and agentic memory isn’t scattered curiosity — it’s one question asked in different forms: how do systems find and use knowledge?
- You rediscover the same lessons roughly every two years. The same principles — focus on fewer things, protect your energy, finish before starting something new — appear across years as if encountered fresh each time. Insight without an enforcement mechanism doesn’t stick.
- Your peak periods were documented but never reconstructed. You wrote down in detail what clarity felt like when you had it. But you never reverse-engineered the conditions that produced it. You treated those states as weather rather than something buildable.
- The vocabulary for your inner life got sharper even when the behavior didn’t change. Early entries describe events. Later entries have a taxonomy — specific frameworks for why a mental state is happening. That’s real growth, even when it doesn’t look like it from the inside.
- The most important sentence in 7 years of writing was an admission of defeat. Accepting you wouldn’t become what you had imagined changed the texture of everything written after it — less fantasy, more presence. Sometimes the most generative thing a journal can do is catch the moment you stop lying to yourself.
I will continue to be my own dataset. Eventually, I want to reach a point where AI can be more proactive for me and alert me whenever I am about to repeat the same mistake.