if BosWeigh — Weight from a single photo · Aftaab Hussain
Projects Playground Blog Timeline Get in touch
← Back to blog
TypeExperiment · Deep Dive
Year2026
RoleSolo · Research
StatusWork in progress

BosWeigh —
weighing a cow with one photo

A research notebook where I tried to recover a cow's live weight from a single smartphone photograph — using monocular depth estimation, five anatomical keypoints, and a 150-year-old livestock formula. This is the story of what worked, what didn't, and what the numbers are telling me to do next.

Computer Vision DepthPro YOLO Keypoints 3D Reconstruction Work in Progress
1
Input Image
Single monocular side view
5
Keypoints
Wither, pinbone, shoulderbone, girth top/bottom
~2.5s
End-to-end
DepthPro + YOLO + geometry on CPU
Annotated cow with front-girth and shoulder-to-pinbone measurements overlaid
One photo in. Length, girth, weight — and a fair amount of doubt — out. This is what the BosWeigh pipeline draws on top of the input.
01 — The Problem

Weight is the one number farmers can't measure

Cattle weight drives almost every economic decision a smallholder farmer makes. It sets drug dosing (a dewormer under-dosed by 20% doesn't work; over-dosed, it can kill). It sets the feed ration (under-feeding a 500kg cow the diet of a 350kg cow is the fastest way to destroy her milk yield). It sets the price at livestock markets, the insurance valuation, and — increasingly — whether a bank will lend against the animal.

And yet almost no smallholder farmer actually knows what their cow weighs. A proper livestock weighbridge costs lakhs and needs civil work. A handheld weight tape is close to accurate but is fiddly, assumes a cooperative animal, and most farmers don't own one. Everybody guesses, and everybody guesses low.

While building GauSwastha, I had a regression head that output weight from an image — but it was a black box bolted onto a multi-task model, trained on field labels that were themselves guesses. I wanted to see whether the geometry of the cow, recovered directly from a photo, could get to an answer that was explainable from first principles. That's what BosWeigh is.

The question

Given one ordinary side-view photo of a cow, can we recover enough 3D information — real-world length, girth, curvature — to estimate live weight from a formula rather than from a learned regressor? And if we can, how close does it actually get?


02 — The idea

What if monocular depth is good enough now?

The classical approach to measuring an animal from an image needs either a depth sensor (LiDAR, Kinect, structured light) or a fixed reference (a calibration board, a known-size object in-frame). Neither is a realistic ask for a farmer in a shed. So every paper I read worked on rigs the farmer will never buy.

Then in late 2024, Apple released DepthPro — a transformer-based monocular depth estimator that predicts metric depth and focal length from a single RGB image, with no calibration and no reference object. That changed the economics of the problem. If I can get reasonable metric depth from the same image the farmer is already taking, then a phone becomes a 3D scanner.

So the research question collapsed to one thing: is the depth quality good enough that 3D distances computed on top of it are accurate enough to feed into a weight formula?

Raw side-view photo of a cow in a concrete yard
Input — an ordinary side-view photo. No depth sensor, no reference object, no calibration.
DepthPro monocular depth estimation output showing cow silhouette with background
DepthPro output — per-pixel metric depth in meters. The cow sits at ~2.2m from the camera; the background fence is clearly farther.

03 — Pipeline

Five keypoints, one depth map, a bunch of geometry

The pipeline is deliberately thin. There are only two learned components: a monocular depth model (frozen, off-the-shelf) and a small YOLOv8 keypoint detector I trained on side-view cow images. Everything else is plain geometry — which means every failure is debuggable.

01
DepthPro — monocular depth + focal length

Run Apple's apple/DepthPro-hf on the raw image. It returns a per-pixel depth map in metres, a predicted focal length, and field of view. No calibration, no rig — we trust the model to recover enough scale to be useful.

02
YOLOv8 keypoint detection

A custom keypoint model locates five anatomical landmarks on the cow's side profile: wither, shoulderbone, pinbone, front-girth top, front-girth bottom. These are the same points a vet would find with their hands.

03
Unproject 2D → 3D

Using the predicted focal length and per-pixel depth, lift each 2D keypoint into a 3D point in camera coordinates. Now the cow lives in metric space — millimetres matter.

04
Arc-length integration

Sample 200 points along each measurement line (e.g. front-girth-top → front-girth-bottom). Read depth at each sample, unproject to 3D, and sum Euclidean distances between consecutive 3D points. This follows the actual curvature of the body instead of cutting a straight line through it.

05
Symmetry-based circumference

A single side view only shows half the animal. I build a plane normal to the shoulder-pinbone axis at the girth line, then mirror the visible half-arc around that plane's depth centreline to estimate the full circumference. Crude, but a starting point.

06
Apply the livestock formula

Convert length and girth to inches, plug into weight = (length × girth²) / 660, and report kilograms. The whole chain runs in about 2.5s on CPU.

YOLOv8 keypoint detection on side-view cow image
YOLOv8 keypoint detector at 0.96 confidence — coloured dots mark wither, shoulderbone, pinbone, and the two front-girth points. Trained on a small hand-labelled set.

04 — The geometry trick

Straight lines lie; arcs tell the truth

The obvious way to measure "length" on a cow is to take the straight-line 3D distance between shoulder and pinbone. That's what the literature mostly does. It's also wrong — the cow's back is curved, the belly droops, the ribs flare out. A straight line from shoulder to pinbone systematically under-measures.

So instead of one-shot endpoint distances, I sample the line between two keypoints at 200 intermediate pixels, read the depth at each one, unproject to 3D, and sum the consecutive segment lengths. The result is a 3D arc that hugs the body surface — which is what a tape measure does when a vet wraps it around a real cow.

Chart showing camera distance varying along the front-girth line, dipping from 2.35m to 2.20m in the middle
Depth along the girth line. The belly bulges out 15cm closer to the camera than the wither — exactly the curvature a straight-line metric would erase.
Chart showing cumulative 3D arc length along the front-girth line growing from 0 to roughly 0.7m
Cumulative 3D arc length. Total visible girth-side ≈ 0.69m; the straight endpoint-to-endpoint distance is noticeably less.
bosweigh / compute_cow_metrics.py
# sample 200 points between two keypoints, unproject each to 3D, # then integrate segment-by-segment along the body surface us, vs = sample_line(p_top[0], p_top[1], p_bot[0], p_bot[1], n=200) Zs = bilinear_sample(depth_m, us, vs) # metres pts3d = unproject(us, vs, Zs, fx, fy, cx, cy) # (N, 3) seg_lens = np.linalg.norm(np.diff(pts3d, axis=0), axis=1) arc_len = seg_lens.sum() # 3D arc, in metres

05 — The formula

A nineteenth-century equation still runs the industry

The final step is almost comically simple. There is a formula that cattle traders, veterinarians, and agricultural extension officers have used for well over a century — it's in Indian government livestock handbooks, USDA extension leaflets, and pretty much every animal husbandry textbook:

Heart-girth formula
Weightkg = (Length × Girth²) / 660
length and girth in inches · constant 660 calibrated for Bos taurus / Bos indicus

The assumption it encodes is that a cow's body approximates a cylinder whose volume — and therefore mass — scales with the square of the heart girth times the length. It's empirical, but for reasonably-conditioned adult cattle it sits within ~10% of weighbridge numbers when the measurements are taken properly.

So the entire modelling problem reduces to: how well can I measure length and girth from a photo? Everything in the pipeline before this point is in service of getting two numbers right. Weight drops out for free.

Cow with final annotated metrics overlaid including front-girth and shoulder-to-pinbone length
Final overlay — girth arc (front-girth), body-length arc (shoulder → pinbone), and the weight derived from them. Estimated weight on this specific test image: 258.5 kg.

06 — What the numbers say

The results so far are humbling

I validated the pipeline on 42 cows that had ground-truth tape-measured girth and length, and 49 with ground-truth weight. The short version is that the pipeline runs end-to-end, but it is not yet accurate enough to trust. Here's the honest table from the latest run:

Metric
n
MAE
RMSE
Girth (in)
42
20.8
26.5
−22.0
Length (in)
49
43.5
84.2
−139.3
Weight (kg)
42
500.5
1013.6
−392.7
Reading the table

A negative R² means the model does worse than just predicting the mean. That's not a rounding error — it's a signal that the underlying measurement chain is systematically off. The pipeline works mechanically, but the numbers it produces are not yet calibrated to ground truth. This is where the project actually sits today.

The failure modes, in rough order of suspected impact:

DepthPro scale drift

DepthPro predicts metric depth, but the absolute scale varies with scene content — especially for outdoor shots with fences and open sky. A 5% depth scale error becomes a ~10% girth error and a ~20% weight error because girth is squared in the formula.

Keypoint confidence ≠ position accuracy

YOLO reports 0.96 confidence on the bounding box, but the individual keypoints can still be 10–20 pixels off. On the girth line that's a couple of centimetres of real-world error; compounded across five keypoints, it drifts the length arc by a lot.

Symmetry is a lie

Mirroring the visible half of the cow around a vertical plane assumes perfect left-right symmetry. Real cows sag, lean, and stand on uneven ground. The circumference estimate inherits whatever asymmetry is in the pose.

Ground-truth noise

Tape measurements themselves are noisy — different vets pull the tape at different tensions, and "length" is defined inconsistently across sources. Some of the error I'm seeing is almost certainly in the labels, not the model.


Code is public
Notebook, model weights, and validation data
All on GitHub. Issues, ideas, and pull requests welcome.
Open the repo →
← All posts Fine-tuning LLaMA 3 on agricultural data →
Open notebook on GitHub →
9 min
~1,800 words