All posts

PDPL compliance for AI training data — the operational guide

Why PDPL matters for AI training data

The Saudi Personal Data Protection Law (PDPL, Royal Decree M/19 of 2021; Implementing Regulations issued March 2023; in force 14 September 2023, grace period ended 14 September 2024)[^1] governs processing of personal data in or about KSA residents. Unlike many privacy regimes, PDPL applies extraterritorially — a US company processing KSA resident data is subject to PDPL. PDPL’s territorial scope under Article 2 is in fact broader than GDPR’s: it does not require an “offering goods/services” or “monitoring” hook.[^7]

For AI training data specifically:

PDPL has views on all four.

The six PDPL operational dimensions

1. Data residency and sensitive-data handling

PDPL itself does not enumerate a blanket list of personal-data categories that “must remain in-Kingdom.” Three distinct regimes interact here, and conflating them produces misleading conclusions:[^4]

For AI training datasets touching the sensitive PDPL categories — or any government or financial data — the annotation pipeline normally needs to operate in-Kingdom: either in a KSA cloud region or on-premise in the customer’s KSA facility.

Operational implication: US-hosted SaaS annotation platforms processing sensitive PDPL categories at scale typically need to add documented safeguards (SCCs, adequacy determination, or explicit consent) plus the Article 29 risk assessment. For government or financial data, US-hosted processing is generally not lawful regardless of safeguards; sovereign tenancy in KSA cloud (Google Cloud Dammam, Oracle Riyadh, AWS Bahrain peering) or on-premise install is the practical default.

2. Article 24 — 72-hour breach notification

Article 24 of the PDPL Implementing Regulations requires notification to SDAIA within 72 hours of becoming aware of a personal data breach where the breach may cause harm to the personal data or conflict with the data subjects’ rights or interests, and notification to affected data subjects without undue delay where the breach poses risk to their rights or interests.[^2] This is operationally close to GDPR Article 33’s 72-hour clock, with the SDAIA notification track separate from the data-subject track.[^3] The 72-hour window is conditional on potential harm — not every incident triggers the clock, but pipelines should default to assuming it does.

For annotation pipelines, “breach” includes:

Operational implication: the annotation platform must have:

3. Lawful basis tracking

PDPL requires processing of personal data to rest on a lawful basis. The Implementing Regulations recognise grounds substantively equivalent to GDPR Article 6: consent, contractual necessity, legal obligation, vital interests, public interest, or legitimate interest (with balancing test).[^1]

For AI training data, the lawful basis question becomes:

Operational implication: the annotation platform must track lawful basis per dataset. Mixed datasets (consent + legitimate interest spanning different sources) require per-record basis tracking, not just per-dataset.

4. Data subject rights

PDPL Article 4 grants KSA residents rights to:

For AI training data, the rights questions become:

Operational implication: the annotation platform must support:

5. Sub-processor management

PDPL requires data controllers to ensure sub-processors maintain equivalent protections. For AI training data, sub-processors typically include:

Operational implication: annotation vendors must:

6. Cross-border transfer

PDPL is directionally stricter than GDPR on cross-border transfer, with the August 2024 Regulation on Personal Data Transfer Outside the Kingdom as the principal instrument.[^6] Permitted mechanisms include adequacy determination, Standard Contractual Clauses, Binding Common Rules, and explicit consent. SDAIA per-transfer approval is not required for every mechanism — SCCs and adequacy-based transfers can stand on documented safeguards — but Binding Common Rules and certain sensitive-category continuous transfers require SDAIA engagement and a risk assessment.[^6]

Operational implication: if your annotation workflow involves any data crossing the KSA border (workforce outside KSA, infrastructure outside KSA, customer support outside KSA), document which transfer mechanism applies, complete the Article 29 risk assessment where required, and retain evidence.

What good PDPL operationalisation looks like

A serious PDPL-aligned AI training pipeline includes:

  1. In-Kingdom deployment — KSA cloud region or on-premise for sensitive categories and any government / financial data
  2. 72-hour breach notification workflow — runbook + on-call + template
  3. Per-dataset lawful basis tracking — enforced at the annotation platform layer
  4. Data subject rights workflows — search, surface, edit, delete, export, audit
  5. Sub-processor disclosure + objection rights — standard DPA clauses
  6. Cross-border transfer documentation — Article 29 mechanism and risk assessment when applicable
  7. DPO appointed + contactable where required — DPO appointment is conditional, not universal: it is triggered for public entities providing large-scale services involving personal data; controllers whose core activities involve regular and systematic monitoring; and controllers whose core activities involve processing sensitive personal data, per Article 30(2) of PDPL, Article 32(4) of the Implementing Regulations, and SDAIA’s August 2024 Rules for Appointing a Personal Data Protection Officer[^8]
  8. DPIA for high-risk processing — recommended even where not strictly mandatory

If your current vendor offers fewer than 5 of these, you are at PDPL exposure.

How Annota8 is approaching this

Annota8’s annotation workflow is being designed with PDPL considerations in mind from day one. Our design intent across the six dimensions:

PDPL operationalisation is not a checkbox exercise — it requires a documented data flow, breach drills, and a privacy-by-design workflow. We’re sharing what we’ve learned designing toward this for early customers.

PDPL vs GDPR — which applies to your AI workload?

Both can apply simultaneously. Quick decision matrix:

Data subjectProcessing locationPDPL applies?GDPR applies?
KSA residentKSAYesNo
KSA residentEUYesYes (Art. 3(1) establishment)
EU residentKSAPossibly (if Art. 3(2) targeting/monitoring)Yes
EU residentEUNoYes
US residentKSANo (PDPL); other US state laws may applyNo
Mixed multi-jurisdictionalAnywhereYes per KSA recordsYes per EU records

GDPR Article 3(1) applies to processing in the EU regardless of the data subject’s location, so any processing established in the EU triggers GDPR.[^9] PDPL applies to processing of KSA residents’ personal data wherever the processing occurs (Article 2).[^7] For multi-jurisdiction AI training corpora, both regimes apply to their respective subjects. The annotation platform must support both.

Discuss PDPL-bound annotation → 30-min session Read PDPL compliance overview