All posts

What HUMAIN will buy in 2026: an outside-in read

Why this analysis, and where it comes from

HUMAIN was formally announced in May 2025 as an AI company wholly owned by Saudi Arabia’s Public Investment Fund (PIF)[^1]. In its first year, it issued a series of partnership announcements worth reading literally: Nvidia as the chip supplier of record[^2], AMD as a complementary compute partner[^3], Cisco for the network fabric[^4], and the development of ALLaM as the flagship Arabic foundation model[^5]. Its operational footprint includes a sovereign cloud, in-Kingdom data centers, and AI research teams distributed between Riyadh and global hubs[^10].

The question every regional annotation vendor is asking today: “where do I enter?” The honest answer requires distinguishing between three fundamentally different spend segments. This post separates them — outside-in, built only on public announcements and published executive interviews. I am not inside the procurement cycle. Annota8 is not a HUMAIN vendor of record. I write this because a candid practical read serves a reader looking for a map better than claiming knowledge I do not have.

HUMAIN’s five spend segments

SegmentEstimated 2026-2027 sizeWho takes itDoes a regional annotation vendor enter?
Data center build-out + chips + infrastructureTens of billions USD multi-yearNvidia, AMD, Qualcomm, Groq, Cisco, AWS, Saudi construction contractorsNo — this is not our game
Training data + curation for ALLaMMid-to-high 8-figure USD annuallyData + annotation + curation providersYes, accessible segment
Safety evaluation + red team + behavior annotationMid 8-figure to low 9-figure USD over ~24 monthsSpecialized eval houses + safety experts + prompt authorsYes, high-value segment
Enterprise AI deployment partnershipsDistributed OpEx, low CapExSystem integrators + digital transformation partnersPartially — as an annotation partner, not as an integrator
Internal hiring + research infrastructureTens of millions USD annuallyArabic NLP research talent + academic partnersNo — this is internal infrastructure

These are estimates built on reasonable industry ratios for a company of this ambition, not disclosures. They err on the conservative side; the chance that actual numbers are higher exceeds the chance they are lower.

Segment 1: data centers and chips — not our game

The largest spend by a wide margin goes to compute infrastructure. The announced Nvidia partnership alone absorbs GPU volumes comparable to major global operators[^2]. AMD[^3] and Qualcomm[^9] serve as diversification sources to reduce single-vendor risk, Groq handles inference workloads[^8], Cisco handles data center networking[^4], and AWS provides cloud capacity[^7]. Add the construction spend for the in-Kingdom data center facilities, cooling systems, power, and physical security — and the conversation is in tens of billions USD over multiple years.

Any regional annotation vendor claiming access to this segment is either misleading or misunderstanding the field. This is not the data annotation business. The suppliers to this segment are chip manufacturers, hyperscale infrastructure integrators, and Saudi construction contractors. The point that matters for us: the size of this segment does not mean there are “crumbs” available for us inside it — it means HUMAIN’s executive and operational focus in 2026 will be concentrated on executing this build-out, which makes procurement cycles for smaller segments more methodical, not less.

Segment 2: ALLaM training data — this is where we enter

ALLaM is the Arabic foundation model HUMAIN is building (after SDAIA developed it in its earlier iterations)[^6]. Each successive training generation requires major training data expansion. This breaks operationally into three layers:

First, pre-training data — vast volumes of cleaned Arabic text, with comprehensive dialect coverage, genre diversity (news, literature, formal, dialogue, technical, scientific), and deep quality processing. This layer is high-volume but low margin on the specialized annotation side; most of the work is curation, cleaning, and deduplication.

Second, Supervised Fine-Tuning (SFT) data — high-quality, hand-crafted instruction-response pairs requiring researchers at academic linguistic level. This layer is much smaller in volume but the value margin per sample is orders of magnitude higher. This is where Cairo’s academic depth outcompetes alternatives.

Third, RLHF preference data — A/B response pairs with deliberate human preference. This layer shapes the model’s personality, safety, and cultural diplomacy. For an Arabic model at ALLaM’s level, the cultural and dialectal nuances in RLHF preferences are critical in a way that does not apply to an English model.

What a regional annotation vendor can credibly offer here: pre-training data curation at volume, SFT pairs at PhD-grade QA, RLHF samples with dialect coverage. The estimated spend in this segment lands in mid-to-high 8-figure USD annually — meaningful, requires multiple distributors, and rewards vendors who can pass strict quality gates.

Read more about our foundation model solutions and the ALLaM training data resource. Glossary entries for SFT and RLHF.

Segment 3: safety evaluation and red team — highest value per sample

Any LLM with production ambition at ALLaM’s level needs a deep safety evaluation layer before deployment. This covers several work types:

This segment is smaller in volume than training data but requires an elite tier of workers: language experts, cultural and Islamic studies specialists, cybersecurity experts, legal experts. Likely spend over ~24 months: mid 8-figure to low 9-figure USD. Margin per labor hour is substantially higher than standard annotation segments.

Glossary entries for eval set construction and jailbreak red team labeling.

Segment 4: enterprise AI deployment — partially accessible

HUMAIN aspires to be an engine for AI deployment across Saudi government and enterprise sectors. This generates a chain of application projects: AI for healthcare, education, government services, energy, financial services. Each application project carries a value chain:

A regional annotation vendor typically enters as the third-layer data provider, sometimes as second-layer fine-tuning partner. Entering as a first-layer SI requires software engineering and project management capabilities most annotation houses do not have. The practical model: partner with Saudi SIs (Elm, Solutions by STC, NTGI) to deliver the data layer for projects they lead, rather than competing with them for the integrator role.

Segment 5: internal hiring + research — not accessible

HUMAIN is hiring Arabic NLP researchers, data scientists, and ML engineers aggressively. This is not a vendor opportunity — this is talent competition. Any regional vendor that has built a substantial researcher layer will lose some of them to HUMAIN in 2026-2027 to higher salaries and sovereign incentives. This is a reality to accept and plan around by building a deep leadership layer, not to deny.

Procurement realism: what HUMAIN requires of its vendors

HUMAIN, as a Saudi-designated entity, operates under semi-formal government procurement constraints even as a legally independent company. The practical consequences:

What a small/mid annotation vendor should not claim

Claim integrity is part of long-term commercial survival. What a vendor at our scale should not claim — to HUMAIN or to the market:

What a vendor at our scale can credibly offer

What we actually deliver and what sits within Annota8’s ambition scope:

Read our workforce architecture and data curation methodology.

The honest bottom line

HUMAIN in 2026 will spend the majority of its budget on infrastructure and chips, and no annotation vendor enters that segment. The segments accessible to us — ALLaM training data, safety evaluation, culturally-aware RLHF — are much smaller but they are real, and the value margin per labor hour in them is high. The vendor who enters with modest claims, transparent about what it does not do, builds long-term trust. The vendor that claims hyperscaler capabilities loses credibility in the first conversation.

To repeat: Annota8 is not a HUMAIN vendor of record today. This is general analysis from outside the procurement cycle, built on public announcements. I publish it because honest analysis serves the reader better than claiming knowledge I do not have. If you have information that contradicts anything in this post, I welcome the correction.

Discuss an annotation layer for your Arabic model — 30-minute call Read our ALLaM resources