Clarity Audit - It Never Stopped Running

When every time entry triggered a cascade, Friday meant failure.

Published July 30th, 2025

Each Clarity Audit Story documents a real system I’ve worked on — not as a case study, but as an architectural readout.

The goal isn’t to celebrate fixes. It’s to surface structure.

These aren’t stories of failure. They’re stories of what the system allowed, why it behaved that way, and what changed when it was rebuilt with intent.

The fixes are just a side effect.

Every Friday, the system buckled.

Over 20,000 time entries were logged steadily throughout the day — often one every few seconds. Most came from offshore teams submitting weekly hours in bulk. The volume wasn’t surprising. It was expected.

But each of those entries triggered automation.

Every single one.

The Architecture

The structure seemed logical at first glance:

A time entry updated an hour record
That hour update triggered a recalculation on the associated project
The project update then triggered recalculation on the parent project

This wasn’t incidental behavior — it was wired in. The system used event-based triggers to keep everything in sync. On paper, that looked like automation. In practice, it was a recursive loop with no limit and no boundaries.

Each automation flow was scoped by name — hour logic, project rollup, aggregate flow — but not by execution context.

One process updated another, which kicked off another, which reached back into the first.

None of this was visible in a single place.

This was webhook storm chaos — an architecture where every small input is treated as an urgent event, without boundaries, batching, or orchestration.

By Friday afternoon, the system was saturated.

Operations queued indefinitely. New jobs couldn’t start. Other automations failed outright — not because they were broken, but because the system couldn’t recover.

The only option was to shut everything down manually.

Roll-up logic lagged. Project totals were incomplete. Reporting couldn’t be trusted.

And no one could tell whether the numbers were wrong — or just not finished.

The Fix

The problem wasn’t the number of entries. It was the decision to treat every entry as urgent.

The fix was structural.

Instead of triggering a full chain of calculations for every time entry, I added a custom field to projects: requires time roll-up.

When an hour was logged, the related project was flagged — once.

If it was already flagged, nothing happened. Whether one person logged time or a hundred did, the result was the same: one tag.

Then, every 12 hours, a scheduled process queried for flagged projects and ran the roll-up logic — once per project, with full context.

There was no more reactive triggering.

No more hidden recursion.

Just a timed, visible sweep of the work that actually needed to be done.

The fix wasn’t a workaround. It was a redesign.

We removed the webhook entirely, replaced it with a flag, and ran a controlled batch every 12 hours.

No recursion. No surprise triggers. Just a stable, purpose-driven process.

The Result

50% reduction in total automation operations across the environment
No loss in functionality
No more Friday shutdowns
Roll-up logic became observable and predictable
Data integrity was restored

And the flag became more than a control — it became a feature.

If a team needed to reprocess hours on an older project, they could just reapply the flag. It would be picked up in the next batch.

No overrides. No guesswork. Just signal and wait.

What People Noticed

There were no objections to the delay.

The truth is, no one had been relying on the numbers in the first place.

They’d been wrong for so long — noisy, incomplete, timing-sensitive — that most teams had stopped trusting them entirely.

After the change, nothing felt slower.

What people noticed was that the numbers were finally consistent.

And finally usable.

What This Really Was

This wasn’t a suppression. It was a realignment.

The platform made it easy to react — to build flows that launched on every change.

But the business process wasn’t reactive. It was batch.

It didn’t need speed. It needed rhythm. And integrity.

The system expected regular input. What it got was thousands of people rushing to close their week.

It wasn’t designed for human behavior — just data flow.

Replacing automatic reaction with buffered control didn’t limit the system.

It made the system finally do what it was meant to do.

Now, it runs on purpose — at the right time, for the right reason.

And the numbers can finally be trusted.

Want your system audited like this? → Start a Scoped Conversation