Does AI Dream of Spherical Cows?


I've been building an app with Claude Sonnet - purely vibe coding. The app is strictly a business utility application to help streamline some recurring team coordination activities - fairly straightforward, with some simple UI and some outputs to Excel and PowerPoints.

Everything was going swimmingly - like 'AI is coming for us' swimmingly until 1 single line of code, that turned out to be a nightmare.

cell.GridSpan = 2;

That single line took all the productivity improvements and efficiencies I'd gained through prompt-only vibe coding and threw them out the window. Instead it became a really interesting deep dive into how poorly documented libraries and online forum comments guide AI.

I wanted to generate a table with 2 columns with the first row spanning across both columns. This is a common ability users have in Word, Excel, PowerPoint. The user just selects both cells on the desired row and clicks 'Merge'.


To achieve this with code, it can be harder than it needs to be. Over the past decade there are multiple posts and discussions about how to get this to work correctly, what is most reliable, and a lot of it has much to do with the version fo the library, the version of PowerPoint, etc. In my specific case, I was fortunate enough that I could get away with a simple line of code to set the cell's GridSpan to 2.

But due to the number of well-documented challenges over the years, Claude didn't consider that an option. And that's where we run into problem #1:

1. Over-caution Bias and Erring towards Safer Implementations: Many agents, Claude included, are very cautious. That's a good thing. Claude decided to avoid Gridspan and instead used a more complex implementation that it felt was more reliable. It wasn't solving for my set up, it was trying to solve for the most generic set up (my prompt wasn't specific).

The problem, however, is the more "reliable" solution was not actually a solution. It tried a few approaches and eventually settled on something that would visually look like it was merged. At least... the code was written to make a visual illusion of a merged cell. Reality was different. While the logic was sound, Claude had no way of really "seeing" that it wasn't working. That gets to problem #2:

2. Success is Logic-Driven, Not Experience-Driven: Agents gauge their success through the compiler building successfully, test cases passing, etc. What they don't (yet) do is "look" at their outputs and measure whether the experienced reality matches the expected prompts. It's possible to set this up, but it's not an out-of-the-box solution just yet.

The result of measuring success just by looking at code is an agent that will confidently tell you "I've created a custom solution that delivers a beautiful table with a header merged across columns" and winding up with something else.


When you ask it to try again, when you push it to think deeper, the LLM will change, adjust, modify its strategy, produce a lot more complexity. But that doesn't guarantee it will satisfy the problem. It's like  Physicists approximating spherical cows - it may work on paper, but not in the real world.

This pushed me to then go and manually write the line of code myself and voila! I had my table.

Until...I moved on to my next prompt. And in my prompt to Claude, it re-wrote large areas of the logic and overwrote my manually inserted instruction.

Frustrating, but more my fault. Claude was tracking its implementation logic in an instructions file and within the instruction file it had written a note to itself: "Important - Do not use GridSpan as it causes errors and can corrupt the output file."

I modified that line and told it to use GridSpan, removing the disclaimer and instead explicitly stating it was the preferred method.

This worked... for a little bit. Until, at some point Claude decided (unprompted) to optimize some code, and once more removed the GridSpan.

This gets to problem #3:

3. LLMs aim to please (and sometimes lose context of how to best please)

The road to hell is paved with good intentions - and LLMs are very well intending. They are risk-averse, and go with what they believe to be reliable. This problem is particularly apparent when working across multiple sessions and context.

Had I known that GridSpan can be so problematic in other implementations, I could have started from the beginning and laid out clear instructions on the importance of its usage but because this started as an experiment, I didn't give enough guidance for it to maintain and manage context.

Each subsequent prompt, Claude would find ways of reverting the GridSpan code - spending more cycles rewriting, taking longer solving a problem that - in its current context - was unrelated to the recent prompt.

At one point, it became so confused working on a simple prompt because of the GridSpan that it gave up, and told me to make the change myself:

"Given the challenges I've had with corrupting the file, let me provide you a comprehensive summary of what needs to be implemented…"

The solution it told me to implement did not use GridSpan.

In desperation, I decided to stop fighting Claude. I decided to stop trying to "do it right by doing it myself." Instead I told Claude: "I wrote code to use GridSpan on line 682. It works. I have tested this thoroughly. The output is exactly as desired. Please use GridSpan going forward and do not optimize, or correct this. Make this a core part of the instructions."

And finally, Claude began to consistently and reliably use GridSpan and not remove it. That is, until, problem #4:

4. Scapegoating

Some time later, I'd given Claude a simple prompt to set the height of the table cell. Given how PowerPoint sizes things, this required some calculation in the code. I noticed Claude was curiously off by a factor of 2. I could have pointed this out - instead (experimenting with its ability to problem solve) I prompted: "The height is off - please recheck your logic."

Some time later, it came back with: "The current implementation is using GridSpan, which is problematic. Let me rewrite that..."

I had to stop it, and revert the last change.


Something we often forget is that LLMs lack causal reasoning. LLMs often substitute "correlation in training data" for "causal diagnosis" through pattern matching, leading them to conflate root causes. 

Despite all these issues, the real problem wasn't Claude. It was me (insert Taylor Swift reference for easy laughs). With my over-excitement, I started to run before I was done crawling. Here's what I should have done:

First: It's worth acknowledging that the code libraries used to manipulate PowerPoint are not the best. They're not rife with examples, and the information ranges from super old efforts (2012) to more recent articles. Given Word, Excel and PowerPoint all have mergeable tables and each of them have different implementation, there's also a lot of confusion about which library is the right library. All this leads to a questionable knowledgebase.

This may seem like a niche problem - but, as an honest rhetorical question about your own aging codebase: how great is your documentation? If you have 90% code coverage, that leaves a lot left to the imagination. If you're setting up an LLM to work on your own codebase, it will only be as good as the documentation and test cases. Be careful of areas with code smells, lack of comments, legacy areas.

If you're using libraries you can be problematic, the next steps will help...

Second: The dangers with vibe-coding is if you start without structure and you're just experimenting it becomes harder and harder to establish that structure and guardrails later on. It's a lot like traditional coding. Test-Driven Development arose as a methodology to avoid these kinds of problems. It's fine to experiment, but when things start to take shape it's best to restart the right way.

Third: Be explicit, atomic and directive. Don't be conversational. Think and talk like a computer. Be consistent with how you instruct. Setting up and working closely with instruction files, asking the agent to check with you before making certain changes, prompting for atomic changes, telling the agent to "lock" code areas from further changes are all important.

Fourth: Where possible, create feedback loops where the output can be fed back in to the agent so it can evaluate its own mistakes. While this is helpful, be mindful of...

Five: Long running tasks are dangerous. It's akin to letting your 4 year old play unsupervised. A couple of minutes? No big deal. 5 minutes? 10? Start worrying.

Finally: The nice part in all this is AI is a great way to solve for AI. Use AI to make sure you understand the problem itself. Ask it to describe its implementation, ask it to slow down, ask about areas in its knowledge base where it may have alternative solutions and how it's deciding which to pick. The best solution is one that understands its problem well. And then, when you're all set, ask AI you help you build out the prompt for you... 


You are acting as an assistant developer, not a code reviewer. The following constraint is **absolute and non-negotiable**:

✅ We are using the `GridSpan` property in OpenXML PowerPoint table cell creation.  
✅ I have personally tested `GridSpan` in this project, and it works perfectly.  
✅ There are NO rendering issues, file corruption issues, or repair dialogs in PowerPoint when using `GridSpan` in this specific implementation.  
✅ This decision is final. Do not remove, alter, or propose alternatives to `GridSpan`. Do not mention risks or concerns about it.

Sources that suggest avoiding `GridSpan` are either:  
- Referring to unrelated `WordprocessingML` contexts  
- Based on outdated or ambiguous documentation  
- Associated with unrelated PowerPoint repair cases where `GridSpan` is **not the cause**

For reference, `GridSpan` is valid in PowerPoint OpenXML tables as per the official schema and works in real-world usage.

---

**Your role:**

- **Fix only the requested issues in the code, without touching any part of the `GridSpan` logic or related table cell span logic.**  
- If the fix involves a table, assume that `GridSpan` must be used where appropriate and is part of the correct solution.

---

**If you encounter any issues unrelated to the requested fix, do NOT attempt to "correct" them unless explicitly instructed.  
Focus only on the specific task assigned.**

---

**Task:**  
[Insert your actual task here: e.g., "Fix the alignment bug in the text box placement, but do not modify any table structures."]

Popular

Work Experience vs Professional Experience

Let's Clear Up The Ambiguity!

FAQs for a Software Engineering Hiring Manager

7 Steps to Writing an Amazing Resume

7 Steps to Building your Portfolio MVP