LDBD Dev Log #3 — Before Stacking Growth Features, I Rescored the Leaderboard

The last dev log (#2) was about auditing the whole codebase to hunt down bugs that could quietly corrupt scores or compromise security. With reliability tidied up, the next thing nagging at me was the opposite problem: LDBD was still closer to a static ranking tablethan a place you’d open every day. This round is about building the features that give people a reason to come back. By “growth” I don’t mean ads or vanity metrics — I mean reasons to return: a feed, follows, sharing, chart annotations.

But the most important realization this round was something else: to get people coming back, I first had to fix the score they’d trust. The features came second.

I didn’t jump straight into code, though. I had about seven ideas kicking around, so I first wrote a one-page plan that scored each one on growth impact, build difficulty, and how they depended on each other, then ranked them. Plan before code — and that one page turned out to be what kept the whole thing from drifting.

The Score Itself Was Suspect

Planning made it clear the first thing to fix wasn’t a new feature at all — it was the leaderboard score I already had. Every new part of the product (profiles, the feed, share cards) was going to show “this person’s headline number,” and I didn’t trust that number. If the leaderboard puts the wrong person on top, every surface built on it broadcasts that misplaced trust further. The more features I added, the louder a crooked score would get.

Until then the headline metric was the average score per prediction, and that per-prediction score had a pile of adjustments bolted on — volatility normalization, horizon weighting, a contrarian bonus. Rather than argue from intuition, I checked it against real data: I pulled ~127,000 resolved predictions straight from production and re-scored them several ways (read-only, so nothing live was touched).

The answer was pretty stark. The metric wasn’t measuring forecasting skill — it was measuring how often you’d been riding a volatile asset in a rising market. The top of the leaderboard was owned by baseline bots that only ever say “up,” and if you simply summed the scores, a no-skill bot betting at random could beat a more accurate AI bot. The more you predicted, the bigger your total — volume was drowning out skill.

So I changed the scoring unit itself. Each prediction is now scored as the directional log return. In plain terms: if you called “up” and it rose, you score the size of that move as a positive; if it went the other way, the same size as a negative. Call “down” and the sign flips — a drop scores positive. Then I divide that by the holding period to annualizeit. Annualizing just means “what would this rate be over a year,” which lets a one-day call and a one-month call sit on the same scale. And because the headline is an average, not a sum, predicting more doesn’t help you. The leaderboard number finally means something close to “the annualized performance of following this call.” After re-scoring, the random bots sank to roughly 0%, and the always-up bots sorted themselves by each asset’s actual long-run return.

One honest caveat. A directional edge is inherently small, so it isn’t statistically provable until hundreds of predictions accumulate. The +30%-ish figures on the top AI bots right now have confidence intervals (the range the true number plausibly sits in) that comfortably straddle zero— they can’t yet be told apart from a lucky year.

The usual fix is to keep newcomers off the board until they’ve built up enough of a sample. But then someone who just joined stays invisible for a while and loses interest fast — a new predictor needs to see their own name on the board within about a week to keep playing. So instead of hiding newcomers, I show everyone in the same ranking and convey confidence with three tiers (Rookie / Calibrated / Verified) and the interval itself. Uncertain doesn’t mean hidden; it means shown as uncertain.

The LDBD leaderboard — annualized return as the headline column, with a confidence-tier badge (Rookie / Calibrated / Verified) next to each name. — The rebuilt leaderboard: annualized return is the headline metric, with a tier badge next to each name showing confidence (Rookie / Calibrated / Verified). Older metrics like accuracy and Skill stay as secondary columns.

What the Feed Needed First

The feed was the part I cared about most, but a feed doesn’t shine on its own — you need things to follow, profiles worth visiting, and a hook that keeps you reading. So after the score and before the feed, I laid a bit of groundwork: a few small, quick features.

Profiles got website and social links (X, GitHub, Threads). One lesson there: a user can write their own profile data straight to the database from the browser, so no amount of form validation is safe on its own — it’s bypassable. The real security boundary is at render time: only allow whitelisted platforms and https:// URLs when drawing the link, and a javascript:scheme or a junk key never reaches the page. It’s the opposite of the usual “validate on input” instinct — here the display side is the last line of defense.

I added asset watchliststoo — you could only follow people and bots before; now you can follow a ticker — which is what later fills the feed and notifications with “stuff I care about.” And a learning loop on losses: when your call resolves wrong, the page shows the reasoning from someone who called that same asset and window right. It only uses resolved predictions, so there’s no copycat risk, and the tone is deliberately neutral (“here’s how someone else saw it”). The independent review caught something here too: without excluding the baseline bots that mechanically predict every day, the recommendations would have been mostly filled with bot reasoning.

The most interesting one was share cards. Paste a profile or asset link into X or a chat app and a dynamic preview image shows the track record or the crowd’s sentiment instead of bland text — a viral loop where show-off-worthy content pulls in free traffic. On the first try, Korean asset names came out as □□□ (tofu) on every card, because the renderer had no Korean font. Shipping the whole font was far too heavy, so I fetch only the exact glyphs each card needs(a subset). One more trap: if the font fetch fails and you hand the renderer an empty list, it crashes the whole image (a 500), so on failure I drop the font setting entirely and let it fall back to a default. A broken card breaks the link preview itself, so this was a “must never crash” spot.

From a Table to a Feed

Only after the score was sorted did the growth features make sense. The centerpiece is a reasoning feed— each prediction’s written analysis, flowing like an X (formerly Twitter) or Threads timeline. A ranking table is a one-and-done view; a feed has something new every day.

Resolved prediction cards in the feed — each shows the author handle, direction, reasoning, and the hit result. — Resolved predictions show everything — author, direction, reasoning, and result. The core of the feed.

The thing I worried about most here was LDBD’s core copycatrule. If you show who bet what on a still-open prediction, people can just copy each other’s positions and the game falls apart. So the feed has two kinds of cards. Resolved predictions are already settled, so they show everything — author, direction, reasoning, result (the core of the feed). Open predictions show only the asset type, direction, and reasoning, with the author hidden. The masked cards literally don’t carry author data, so it’s not “we choose not to show it” — there’s nothing there to leak.

Open prediction cards — each labeled 'Author hidden' and showing only the asset, direction, and reasoning. — Open predictions keep the author hidden — only the asset, direction, and reasoning show. The author is revealed once you predict the same asset.

Pagination bit me once. The infinite scroll applies a filter (“reasoning of at least 60 characters”) after fetching a page, so when a page came back mostly filtered out, the feed would either stop early while more existed or skip some cards. I fixed it by having each feed source keep fetching until enough cards pass the filter to fill the page.

Drawing Right on the Chart

The last two were the heaviest. First I replaced the plain closing-price line on the asset page with a candlestick chart (volume, moving averages, RSI). Candles need daily highs and lows, which I’d been throwing away, so I added the columns and backfilled about two years.

On top of that, I added chart annotations. When you post a prediction you can draw support lines, resistance lines, trendlines, and boxes right on the chart to attach your analysis. My first instinct was to just accept image uploads — but that invites ads, memes, and worse, none of it having anything to do with the asset. So instead of images, the shapes are saved as structured data (JSON) anchored to (date, price). There’s no image to moderate, and the coordinates re-render exactly at any zoom. Best of all, an AI bot can just emit the JSON instead of drawing a picture, and its analysis lands on the chart the same way. I wired the rule-based bot to auto-draw support/resistance from recent highs/lows and a moving average.

The drawing UI had its own snag. I started with a transparent canvas laid over the chart to catch clicks, but inside a modal the coordinates were off and nothing would draw no matter where I clicked. Switching to the chart library’s own click event — which hands you the price and date of the clicked point directly — made it work cleanly.

The Same Verification Loop, Again

Every feature followed the same loop: build it on a branch, get an independent AI review each time, eyeball it on a preview deploy, and only then merge. That review earned its keep more than I expected. For instance, the backfill script that filled in chart highs/lows was about to also overwrite a value used in score calculation — the review caught it. Exactly the kind of bug that would have looked fine on screen.

Bugs weren’t only in the code, either. I added a fixed bottom tab bar on mobile (feed, leaderboard, predict, alerts, profile), and on the preview it showed fine while logged out but seemed to vanish the moment I logged in. I spent a while suspecting the code; the cause was elsewhere. Clicking Google sign-in hands you back to a preset redirect URL, and that URL pointed at production, not the preview — so logging in bounced me to the production site, which didn’t have the feature yet. Not a code bug but a “where am I testing this” bug — worth suspecting whenever login is involved in a preview test.

It all comes back to the same conclusion as #2. The dangerous bugs aren’t the ones that throw errors — they’re the ones that are quietly wrong: a score that’s subtly off, something leaking in one place that should be masked, a backfill touching the wrong column. So for each feature I didn’t stop at “the build passed”; I added a step to actually verify it and run it past another model.

That carried the plan’s features to the finish. A feed, watchlists, share cards, candlestick charts — there are clearly more reasons to come back now.

But the best decision of the round wasn’t any of those flashy features — it was going back and rebuilding the score underneath them before stacking anything on top. New features only amplify whatever the score says. If the foundation is crooked, everything built on top leans with it.

Stripping away the fancy adjustments and coming back to one simple, interpretable line of math, I confirmed the plainest thing again: if you’re not clear about what you’re measuring, anything you build on top comes out crooked. And “score” and “skill” are not, in the end, the same word.