Reinventing the Text Editor for LLMs

Reinventing the Text Editor for LLMs

Each member of our founding team comes with over a decade of experience building software. If there are two things we've all learned, they're:

  1. Never reinvent the wheel, and
  2. Validate your ideas before investing too much time in them.

This is the story of why we reinvented the text editor to integrate into LLMs, and why we spent a year doing it.

Starting out simple

The idea for Reviso was to simply make the best way to write on the internet. We loved the capabilities that LLMs gave to generate content, brainstorm, and research, but we thought the UX was broken in two ways. One, waiting for an LLM to regurgitate my entire document just to make some small changes, like updating the title or fixing some grammar mistakes, is ridiculous. But this is standard practice with LLMs today. Even "revolutionary" products like OpenAI's canvas do this. It's slow, costly, and frustrating. We thought we could do better.

Second, so many of the generative AI solutions lock the document while the LLM churns through the doc. While the LLM works I can't do anything to my document. Which, in our eyes, was ridiculous. I want to be able to ask the LLM to update paragraph one, while I work on paragraph two.

Initial solutions

We really wanted to start off simple. The first problem with creating a AI writing app is the difference in how we (humans) see document and LLMs see documents. Humans interact with documents through rich formatted interfaces (HTML/JS) but LLMs can only process raw text (and images in some cases). We needed some way of bridging this gap.

So our first task was to build a system that could send raw text (preferably in markdown) to an LLM, but still allow users to edit the document. Then when the LLMs is done seamlessly integrate the generated changes back into the user's version. Effectively creating a parallel workflow: the LLM works with an "offline" copy of the document, and once it completes its task, we merge its contributions back into the "live" document. If we could do this, we would enable real-time collaboration between human writers and AI assistants đŸ¤¯.

Our initial strategy was to "just" build the underlying storage system. We envisioned creating a Conflict-free Replicated Data Type (CRDT), a sophisticated data structure that would enable us to merge real-time user edits with "offline" LLM-generated content.

Then, because we didn't want to build too much, we would leverage an established open-source rich-text editors like Quill, Slate, or ProseMirror to present the document to the user. Done, we would have an amazing AI writing app, one that could allow LLMs to be true real-time collaborators.

Our CRDT

We had never built a Conflict-free Replicated Data Type (CRDT), so first things first, we dove headfirst into a sea of research papers. I even bought an e-reader so I could read them before bed. Each one offered a unique perspective on collaborative editing.

Among the wealth of information we consumed, one paper stood out: "The Art of the Fugue: Minimizing Interleaving in Collaborative Text Editing." This paper, by Matthew Weidner and Martin Kleppmann, introduced the Fugue structure, which seemed to offer the perfect balance of efficiency and elegance that we were after.

The Fugue structure is designed to address a long-standing problem in collaborative text editing: the interleaving of concurrent insertions. When two users simultaneously insert text at the same position, many existing algorithms might interleave the inserted texts, potentially corrupting the document. Fugue aims to minimize this interleaving, ensuring that concurrent edits result in more logical and readable outcomes.

Excited by the possibilities, we rolled up our sleeves and started implementing a Fugue-based CRDT. While implementing, we soon learned one downside to the Fugue structure. While it was brilliant for many operations, it had an Achilles' heel: jumping to specific parts of the document. Imagine trying to navigate a massive tree structure every time a user wanted to edit the middle of a document. Not exactly the seamless experience we were aiming for.

We knew we needed something more. So we combined our cutting-edge Fugue data structure with an old-school Rope data structure. The Rope, a tried-and-true data structure in text editing, allowed us to quickly access any part of the document, while the Fugue handled the intricate details of merging changes. We called our new creation: Rogue.

This hybrid approach gave us the best of both worlds. The Rope structure provided efficient random access and insertion operations, crucial for a responsive user interface. Meanwhile, the Fugue structure ensured that concurrent edits were merged in a way that preserved user intent and minimized unexpected interleaving.

We now had a CRDT that could handle real-time collaborative editing, seamlessly merging changes from multiple sources while allowing lightning-fast access to any part of the document. This combination of Fugue and Rope laid the foundation for our AI-assisted collaborative writing tool, enabling us to create a system where human writers and AI assistants could truly work side by side in real-time.

Initial problems

It turns out converting Markdown back into operations we could apply to our document is hard. Markdown is not standardized. You think it is... but it really isn't. Lists can start with a * or -. Whitespace can mean different things in different contexts. The lack of a strict specification makes parsing Markdown and converting it into a structured format challenging, especially when dealing with edge cases and variations in syntax. But that was a challenge we could push through.

What turns out to be even harder is to keep a browser's representation of the document in sync with the CRDT version. This synchronization challenge arises from several factors:

  1. Different data models: Each rich-text editor has its own underlying data model to represent a document. And each makes assumptions about how the document should be stored. For example, Quill stores all line-level formats on the newline after the content. Our CRDT needs to make its own assumptions about how data is stored. The difficulty is how to align them.
  2. Real-time updates: CRDTs allow for concurrent edits from multiple sources, including AI-generated content. Reflecting these changes in real-time in the browser, without disrupting the user's current view or cursor position, is technically demanding.
  3. Performance considerations: Constantly updating the browser's DOM based on CRDT changes can be computationally expensive, especially for large documents or frequent edits.
  4. Handling complex operations: Some editing operations, like moving blocks of text or applying formatting changes, can be straightforward in a CRDT but complex to represent in a browser's DOM structure.
  5. Maintaining consistency: Ensuring that the browser's representation always accurately reflects the CRDT state, even in the face of network latency or temporary disconnections, adds another layer of complexity.

These challenges pushed us to develop sophisticated synchronization algorithms and optimized update mechanisms. Each time we thought we had it solved, we'd find another synchronization issue. It was time to try something else.

Biting the bullet

We had tried all the easy things, and we still didn't have the product we really wanted. We knew this was not a technical feat another company would be able to fast follow with, and maybe this would be our "moat" in the AI writing space. So we bit the bullet and started building out a brand new rich text editor from scratch. Well, mostly from scratch; we were keeping our CRDT, as it was working well.

With our decision made, we rolled up our sleeves and began development. Our journey began with a pivot in our tech stack. Initially, we had been developing both Go (for the backend) and JavaScript (for the frontend, duh) versions of our editor. However, we soon realized that to achieve the seamless integration and performance we desired, we needed to go all-in on Go. We decided to scrap our JavaScript version entirely and focus on shipping our Go-based editor to the frontend as a WebAssembly (WASM) binary. This move allowed us to leverage Go's performance benefits and maintain a unified codebase across our entire stack.

The next hurdle was bridging the gap between our CRDT and the user interface. We had to build a system that could take our CRDT and render it as HTML, effectively translating our complex data structure into something visually coherent for users. This was no small feat, requiring us to develop intricate algorithms that could efficiently transform our CRDT into a format that browsers could understand and display.

But rendering was only half the battle. We also needed to create a mechanism to handle user input. This meant developing a way to receive user input messages, determine their position within our document tree, and create corresponding operations that we could apply to our CRDT.

To top it all off, we were committed to making our document editing experience collaborative. This required us to implement real-time multi-user support, allowing multiple users (and AI assistants) to work on the same document simultaneously without conflicts. We had to ensure that changes from various sources could be seamlessly integrated, maintaining document consistency across all collaborators.

As you might imagine, this ambitious undertaking wasn't a weekend project. It took us the better part of a year to build, test, and refine our custom text editor. We encountered countless challenges along the way, from performance bottlenecks to edge cases in collaborative editing that we hadn't initially considered. But with each obstacle overcome, we grew more confident that we were creating something truly unique and powerful.

The result of our efforts is a text editor that's not just a tool for writing, but a platform for next-generation collaboration between humans and AI. It's an editor that understands the nuances of both human input and LLM-generated content, capable of merging them seamlessly in real-time. While it may have been a long and arduous journey, the capabilities we've unlocked make every late night and debugging session worth it.

Benefits of bullet biting

Our decision to build a custom text editor from scratch wasn't just about overcoming technical challenges—it opened up a world of possibilities that off-the-shelf solutions simply couldn't match. Here are some of the key benefits we've realized:

Superpowered Undo

Let's face it: LLMs, while powerful, are far from perfect when it comes to writing. They can make mistakes, misinterpret context, or generate content that doesn't quite hit the mark. That's why we've implemented a undo feature that goes beyond the typical ctrl+z. Our undo capability works across sessions, allowing users to revert changes made in previous editing rounds. This is particularly crucial when collaborating with AI, as it provides a safety net for writers. If an LLM-generated change doesn't align with your vision, you can easily roll back to a previous version without losing your train of thought or the overall flow of your document.

Intuitive Diff View

When an LLM makes updates to your document, you need to know exactly what's changed. Our custom diff view highlights these changes in a clear, easy-to-understand format. This feature is essential for maintaining control over your content and understanding the AI's contributions. The diff view allows you to quickly scan through LLM-suggested edits, accepting or rejecting them as you see fit. This level of granular control ensures that the final document is truly a collaboration between human creativity and AI assistance, rather than a stream of LLM slop.

Potential for Offline Edits

While we're not fully utilizing this feature at the moment, our decision to build the rich-text editor in Go opens up exciting possibilities for offline processing. Because we've built the entire stack, including the editor, in Go, we can potentially run the same operations offline that we currently perform online.

This capability could be leveraged in numerous ways:

  • Updating summary documents in the background as new data comes in, ensuring your executive briefs are always current.
  • Transforming meeting transcripts into structured documents or action item lists without requiring constant network connectivity.
  • Performing resource-intensive document analysis or formatting tasks in the background, freeing up the user interface for continued editing.

The offline editing potential exemplifies how our "bullet-biting" approach to building a custom editor has set us up for future innovations that would be challenging or impossible with traditional web-based editors.

By taking control of every aspect of our text editor, we've created a foundation that not only solves our immediate challenges but also positions us to rapidly adapt to the evolving landscape of AI-assisted writing. As LLM technology continues to advance, our custom editor gives us the flexibility to integrate new features and capabilities quickly, ensuring that Reviso remains at the cutting edge of human-AI collaboration in content creation.

Breaking Rules to Break New Ground

Did we make the right decision to build a custom text editor for LLMs? I don't know! you tell us! sign up for revi.so and give it a try đŸ˜†. But by controlling every aspect of the editor, we've created a tool that pushes the boundaries of human-AI collaboration in content creation in ways that would be impossible with off-the-shelf solutions.

Our approach has allowed us to optimize performance and rapidly innovate in a field where the pace of change is relentless. As LLMs continue to advance, our custom editor positions us to continually integrate the latest capabilities.

Knowing when to break the rules is as important as knowing the rules themselves. In the case of Reviso, breaking the conventional wisdom has allowed us to create something truly special.