H3 Sharding: Why Primes Compose and Composites Don't

The previous post built an adaptive frontier: 32,740 non-overlapping H3 cells, each holding at most 10,000 places, covering 72.8M places globally. Those cells are work units ready to queue.

Now the question: how do you split those cells across N parallel workers without overlap?

The simplest possible scheme:

worker_id = h3_cell_index % N

Worker i processes every cell where cell::bigint % N = i. No coordination, no distributed lock, no queue churn. Each worker is fully independent and restartable.

But does N matter? The initial instinct was to reach for primes — they're the canonical "safe" choice for hash distributions. The benchmark confirmed something, but not what I expected: the real distinction isn't prime vs composite at all.

Setup

We use places_h3_t10000, the adaptive frontier at threshold 10,000: 32,740 cells at resolutions 1–9, covering 72,783,221 places. Worker assignment in Postgres:

cell::bigint % {workers_num} AS worker_id

H3 cell indexes have bit 63 = 0 (reserved, always zero for valid cells), so the bigint is non-negative. Worker IDs are always in [0, N).

We tested 14 values of N: primes and composites, small and large, even and odd:

primes:     2, 3, 5, 7, 13, 29
composites: 4, 6, 8, 15, 16, 21, 25, 35

The odd composites (15=3×5, 21=3×7, 25=5², 35=5×7) are the key addition — they let us separate "composite" from "even" as the actual failure condition.

The validation check for each N: do all IDs from 0 to N-1 get at least one cell?

WITH per_worker AS (
  SELECT cell::bigint % {workers_num} AS worker_id
  FROM places_h3_t10000
  GROUP BY worker_id
),
expected AS (
  SELECT generate_series(0, {workers_num} - 1) AS worker_id
)
SELECT
  count(e.worker_id)                                                             AS expected_workers,
  count(p.worker_id)                                                             AS active_workers,
  count(e.worker_id) - count(p.worker_id)                                        AS idle_workers,
  CASE WHEN count(p.worker_id) = count(e.worker_id) THEN 'PASS' ELSE 'FAIL' END  AS check_result,
  array_agg(e.worker_id ORDER BY e.worker_id) FILTER (WHERE p.worker_id IS NULL)  AS idle_ids
FROM expected e
LEFT JOIN per_worker p ON e.worker_id = p.worker_id

Results

N	type	active	idle	check	cv_places	imbalance_ratio
2	prime	1	1	FAIL	—	—
3	prime	3	0	PASS	0.0083	1.006
4	composite	1	3	FAIL	—	—
5	prime	5	0	PASS	0.0140	1.015
6	composite	3	3	FAIL	—	—
7	prime	7	0	PASS	0.0105	1.011
8	composite	1	7	FAIL	—	—
13	prime	13	0	PASS	0.0264	1.037
15	composite	15	0	PASS	0.0314	1.048
16	composite	1	15	FAIL	—	—
21	composite	21	0	PASS	0.0195	1.030
25	composite	25	0	PASS	0.0348	1.065
29	prime	29	0	PASS	0.0316	1.072
35	composite	35	0	PASS	0.0418	1.072

cv_places = stddev/mean. imbalance_ratio = max/mean (how much slower the busiest worker runs vs average). Balance metrics only meaningful for passing configs.

Two things jump out: N=2 is prime and fails. N=15, 21, 25, 35 are composites and pass with distribution indistinguishable from primes of the same size.

Los Angeles area split across 3 workers. Each color is one worker's cells — non-overlapping by construction. Dense urban core subdivides into finer hexes; sparse outskirts stay coarse. — **Los Angeles, N=3 workers.** Each color is one worker's slice. No overlap, no coordination — the H3 index modulo does all the work.

Per-worker detail: pass vs fail

N=7 (prime, PASS) — all 7 workers receive cells, tightly balanced:

worker_id	cells	places
0	4,636	10,516,175
1	4,695	10,326,536
2	4,673	10,235,796
3	4,696	10,503,856
4	4,633	10,418,204
5	4,723	10,477,655
6	4,684	10,304,999

N=8 (composite, FAIL) — workers 0–6 idle; worker 7 handles everything:

worker_id	cells	places
0	0	0
…	0	0
7	32,740	72,783,221

N=6 (composite, FAIL) — only odd IDs receive work:

worker_id	cells	places
0	0	0
1	10,972	24,338,838
2	0	0
3	10,920	24,031,744
4	0	0
5	10,848	24,412,639

N=21 = 3×7 (composite, PASS) — all 21 workers active, even spread:

worker_id	cells	places
0	1,521	3,476,285
1	1,553	3,424,652
2	1,540	3,479,906
…	~1,560	~3,470,000
20	1,555	3,436,357

N=21 (3×7) distributes as cleanly as any prime of the same size. The composite structure doesn't hurt it at all.

Why even N fails

H3 cells are 64-bit integers. The format packs a resolution field and a chain of base-7 digits — each resolution level contributes one 3-bit digit (values 0–6). For a cell at resolution r, positions r+1..15 are filled with the invalid digit sentinel: 7 = 111 in binary.

Our adaptive view has cells at resolutions 1–9, so every cell has at least 6 unused digit positions (levels 10–15), each contributing 3 bits of all-ones — at least 18 bits locked to 1 (more for coarser cells; 18 is the minimum, shared by all). (The sentinel value itself is documented in the H3 spec; the consequence for modulo sharding is not.)

SELECT cell::bigint & ((1::bigint << 18) - 1) AS low18_bits, count(*)
FROM places_h3_t10000
GROUP BY 1;

 low18_bits | count
------------+-------
     262143 | 32740

All 32,740 cells have identical low 18 bits: 262143 = 2^18 − 1. Every H3 cell is an odd number.

This is why even N fails — but the failure goes deeper than parity. For N = 2^k (a power of two), cell % N reads only the low k bits, which are all fixed to 1. So every cell maps to exactly one worker ID: 262143 % N. For N=2 → worker 1. For N=8 → worker 7. For N=16 → worker 15. Workers 0–6 are idle for N=8 not because they're even-numbered, but because every cell lands on residue 7.

The general rule follows from gcd: for N = 2^k × m (m odd), gcd(2^18, N) = 2^k, so the variable part of the cell index (bits 18 and above) can produce only N / 2^k = m distinct residues. Effective workers = odd part of N = m.

N	odd part	active
4 = 4×1	1	1
6 = 2×3	3	3
8 = 8×1	1	1
16 = 16×1	1	1
15 = 1×15	15	15
21 = 1×21	21	21
35 = 1×35	35	35

Odd composites have odd part = themselves, so they fully activate all workers — same as primes.

Balance: composites match primes

For passing configs at comparable sizes:

N	type	imbalance_ratio	cv_places
13	prime	1.037	0.0264
15	composite (3×5)	1.048	0.0314
21	composite (3×7)	1.030	0.0195
25	composite (5²)	1.065	0.0348
29	prime	1.072	0.0316
35	composite (5×7)	1.072	0.0418

N=21 is actually more balanced than N=29, despite being composite. Once N is odd, the factorisation doesn't drive the imbalance — the geographic distribution of places across the globe does, and that's beyond the control of N.

The practical rule: use an odd N ≥ 3. Primes are the safe, no-thought choice. Odd composites work equally well — if your infrastructure gives you 21 workers naturally (say, 3 nodes × 7 processes), don't feel you need to round to the nearest prime.

Composing across two dimensions: workers × time

Once you have an odd prime N for spatial sharding, you can add a second independent dimension — say, a daily batch — using another prime.

-- p = 7 workers, q = 29 day-cycle
-- Day d (0..28), Worker w (0..6)
WHERE cell::bigint % 29 = {day}
  AND cell::bigint % 7  = {worker}

Two separate modulo conditions. No join, no coordination. Each cell either satisfies both or it doesn't.

This works because 7 and 29 are coprime (gcd = 1). By the Chinese Remainder Theorem, the pair (cell % 29, cell % 7) is uniformly distributed across all 29 × 7 = 203 sub-shards. Each worker on each day gets ~161 cells (32,740 / 203) and ~358K places (72.8M / 203).

Why primes matter here more than just being odd. Two odd composites can share a factor. If you used N_days=15 (3×5) and N_workers=21 (3×7), gcd = 3. Then cell % 15 = 0 AND cell % 21 = 1 has no solution — % 15 = 0 forces cell % 3 = 0, but % 21 = 1 forces cell % 3 = 1. Some (day, worker) pairs get zero cells. The affected pairs aren't random — they follow a predictable pattern, meaning entire batches silently vanish.

Distinct primes are always coprime. Use primes for any dimension you want to compose independently, and pick different primes per dimension:

dimension	prime	combined shards
workers only	29	29
workers × daily	29 × 7	203
workers × daily × weekly sweep	29 × 7 × 11	2,233

Each additional prime multiplies the granularity without breaking any existing dimension's distribution. A cell's position in a 3D prime grid is fully determined by (cell % 29, cell % 7, cell % 11), and each coordinate is independent.

Alternative: sharding on place ID instead of H3 cell

The same modulo pattern works on the primary key — place_id::bigint % N = worker_id — and sidesteps the odd-N constraint entirely. Sequential and random IDs don't share H3's sentinel bit pattern, so any N distributes cleanly.

The tradeoff is proximity. Sharding by H3 cell keeps geographically close places on the same worker. That matters for tasks that need to relate nearby places: duplicate detection, geocode QA, brand clustering, ML features using spatial neighbors. Splitting by ID scatters them randomly, so cross-boundary pair detection becomes a cross-worker coordination problem instead of a local gridDisk lookup.

If each place can be processed independently — normalization, enrichment, format conversion — ID-based sharding is simpler. If spatial locality matters, stay with H3 cells and the gridDisk overlap pattern from the previous post.

The worker pattern

import os
import psycopg2

WORKERS_NUM = int(os.environ["WORKERS_NUM"])  # odd, ≥ 3
WORKER_ID   = int(os.environ["WORKER_ID"])    # 0 .. WORKERS_NUM-1

conn = psycopg2.connect(...)
cur  = conn.cursor()

cur.execute("""
    SELECT cell, place_count
    FROM places_h3_t10000
    WHERE cell::bigint %% %(n)s = %(id)s
    ORDER BY place_count DESC
""", {"n": WORKERS_NUM, "id": WORKER_ID})

for cell, place_count in cur:
    process_cell(cell, place_count)

No shared state. Restart a failed worker with the same env vars and it processes the identical set of cells.

With boundary overlap

for cell, place_count in my_cells:
    candidates = load_places_in(conn, [cell] + h3.grid_disk(cell, 1))
    results    = find_matches(candidates)
    emit_home_only(results, cell)

Conclusion

The real distinction isn't prime vs composite — it's odd vs even.

effective workers = odd part of N

Odd N (prime or composite): all IDs 0..N-1 receive cells, imbalance < 10% for N ≤ 35
Even N: all cells land on one residue class — you get only odd_part(N) active workers, and which IDs are active depends on N, not simply on parity
N=2: fails even though it's prime — the only even prime is not exempt
Odd composites (15, 21, 25, 35): pass and balance as well as primes of the same size
Composing dimensions (workers × time): use distinct primes per axis — coprime modulos are independent by CRT, sub-shards stay uniform. Composites sharing a factor silently empty some (day, worker) pairs

When sizing your worker pool, pick any odd number ≥ 3. If you plan to add a time dimension later (daily batches, weekly sweeps), use primes so each axis stays coprime and composable.

Benchmark code: overturemaps-pg/pages/h3-parallel-workers. Full results: results_2026-05-18_220518.md. PostgreSQL 17, PostGIS 3.5, h3-pg 4.x, Overture Maps dataset (72.8M places).