PDPL compliance for AI training data — the operational guide
Why PDPL matters for AI training data
The Saudi Personal Data Protection Law (PDPL, Royal Decree M/19 of 2021; Implementing Regulations issued March 2023; in force 14 September 2023, grace period ended 14 September 2024)[^1] governs processing of personal data in or about KSA residents. Unlike many privacy regimes, PDPL applies extraterritorially — a US company processing KSA resident data is subject to PDPL. PDPL’s territorial scope under Article 2 is in fact broader than GDPR’s: it does not require an “offering goods/services” or “monitoring” hook.[^7]
For AI training data specifically:
- Training corpora often include personal data (names, faces, voices, addresses, behavioural signals)
- The annotation workforce processes personal data as part of labelling
- The trained model may produce outputs that re-disclose personal data
- The procurement workflow may involve cross-border data transfers
PDPL has views on all four.
The six PDPL operational dimensions
1. Data residency and sensitive-data handling
PDPL itself does not enumerate a blanket list of personal-data categories that “must remain in-Kingdom.” Three distinct regimes interact here, and conflating them produces misleading conclusions:[^4]
- PDPL cross-border transfer regime — Article 29 of PDPL plus the August 2024 Regulation on Personal Data Transfer Outside the Kingdom govern transfers abroad. Transfers are permitted with specific safeguards (adequacy determination, Standard Contractual Clauses, Binding Common Rules, or explicit consent), subject to a risk-assessment requirement for continuous or large-scale transfers of sensitive data.[^6]
- PDPL sensitive-data categories — health, genetic, biometric, religious belief, ethnic origin, political opinion, and criminal data are subject to stricter consent and transfer controls under PDPL and its Implementing Regulations, but are not absolutely prohibited from leaving the Kingdom where the regime’s safeguards are met.[^6]
- Absolute in-Kingdom rules outside PDPL — government and public-sector data are subject to in-Kingdom localization under the Cloud Computing Regulatory Framework (CCRF) administered by CST. Financial-services data is subject to SAMA’s own localization rules. These are stricter than PDPL itself.[^4]
For AI training datasets touching the sensitive PDPL categories — or any government or financial data — the annotation pipeline normally needs to operate in-Kingdom: either in a KSA cloud region or on-premise in the customer’s KSA facility.
Operational implication: US-hosted SaaS annotation platforms processing sensitive PDPL categories at scale typically need to add documented safeguards (SCCs, adequacy determination, or explicit consent) plus the Article 29 risk assessment. For government or financial data, US-hosted processing is generally not lawful regardless of safeguards; sovereign tenancy in KSA cloud (Google Cloud Dammam, Oracle Riyadh, AWS Bahrain peering) or on-premise install is the practical default.
2. Article 24 — 72-hour breach notification
Article 24 of the PDPL Implementing Regulations requires notification to SDAIA within 72 hours of becoming aware of a personal data breach where the breach may cause harm to the personal data or conflict with the data subjects’ rights or interests, and notification to affected data subjects without undue delay where the breach poses risk to their rights or interests.[^2] This is operationally close to GDPR Article 33’s 72-hour clock, with the SDAIA notification track separate from the data-subject track.[^3] The 72-hour window is conditional on potential harm — not every incident triggers the clock, but pipelines should default to assuming it does.
For annotation pipelines, “breach” includes:
- Unauthorized access to annotated datasets
- Workforce member exfiltrating customer training data
- Misconfigured access controls exposing data
- Sub-processor security incidents involving customer data
Operational implication: the annotation platform must have:
- Real-time anomalous-access detection
- Incident response runbook with named on-call roles
- Notification template pre-aligned to SDAIA’s required content
- 24-hour internal escalation SLA to leave margin within the 72-hour external clock
3. Lawful basis tracking
PDPL requires processing of personal data to rest on a lawful basis. The Implementing Regulations recognise grounds substantively equivalent to GDPR Article 6: consent, contractual necessity, legal obligation, vital interests, public interest, or legitimate interest (with balancing test).[^1]
For AI training data, the lawful basis question becomes:
- On what basis did the original collector gather this data?
- On what basis does the AI company use it for training?
- Has the lawful basis been documented at the dataset level?
Operational implication: the annotation platform must track lawful basis per dataset. Mixed datasets (consent + legitimate interest spanning different sources) require per-record basis tracking, not just per-dataset.
4. Data subject rights
PDPL Article 4 grants KSA residents rights to:
- Be informed about processing
- Access their personal data
- Correct, complete or update inaccuracies
- Request deletion
- Object to processing
- Withdraw consent
- Data portability
For AI training data, the rights questions become:
- Can you identify all records about a specific data subject within the training corpus?
- Can you delete them on request?
- Can you propagate deletion downstream (model retraining considerations)?
- Can you provide a portable copy?
Operational implication: the annotation platform must support:
- Search-by-subject-identifier across all annotated datasets
- Bulk surface of all annotated content referencing a subject
- Bulk-edit / delete / export with audit trail
- Notification + sign-off workflow
5. Sub-processor management
PDPL requires data controllers to ensure sub-processors maintain equivalent protections. For AI training data, sub-processors typically include:
- The annotation platform vendor (Annota8 or competitor)
- The annotation workforce (in-house, contracted, or platform-managed)
- Cloud infrastructure (the hyperscaler)
- Supporting tools (project management, communication, identity)
Operational implication: annotation vendors must:
- Maintain a complete sub-processor list
- Notify customers of changes in advance
- Provide customer right of objection per DPA terms
- Flow PDPL obligations through to sub-processors via written contract
6. Cross-border transfer
PDPL is directionally stricter than GDPR on cross-border transfer, with the August 2024 Regulation on Personal Data Transfer Outside the Kingdom as the principal instrument.[^6] Permitted mechanisms include adequacy determination, Standard Contractual Clauses, Binding Common Rules, and explicit consent. SDAIA per-transfer approval is not required for every mechanism — SCCs and adequacy-based transfers can stand on documented safeguards — but Binding Common Rules and certain sensitive-category continuous transfers require SDAIA engagement and a risk assessment.[^6]
Operational implication: if your annotation workflow involves any data crossing the KSA border (workforce outside KSA, infrastructure outside KSA, customer support outside KSA), document which transfer mechanism applies, complete the Article 29 risk assessment where required, and retain evidence.
What good PDPL operationalisation looks like
A serious PDPL-aligned AI training pipeline includes:
- In-Kingdom deployment — KSA cloud region or on-premise for sensitive categories and any government / financial data
- 72-hour breach notification workflow — runbook + on-call + template
- Per-dataset lawful basis tracking — enforced at the annotation platform layer
- Data subject rights workflows — search, surface, edit, delete, export, audit
- Sub-processor disclosure + objection rights — standard DPA clauses
- Cross-border transfer documentation — Article 29 mechanism and risk assessment when applicable
- DPO appointed + contactable where required — DPO appointment is conditional, not universal: it is triggered for public entities providing large-scale services involving personal data; controllers whose core activities involve regular and systematic monitoring; and controllers whose core activities involve processing sensitive personal data, per Article 30(2) of PDPL, Article 32(4) of the Implementing Regulations, and SDAIA’s August 2024 Rules for Appointing a Personal Data Protection Officer[^8]
- DPIA for high-risk processing — recommended even where not strictly mandatory
If your current vendor offers fewer than 5 of these, you are at PDPL exposure.
How Annota8 is approaching this
Annota8’s annotation workflow is being designed with PDPL considerations in mind from day one. Our design intent across the six dimensions:
- KSA cloud sovereign tenancy + on-premise options as deployment patterns
- 72-hour breach notification runbook designed into incident response
- Per-dataset lawful basis tracking as a standard schema requirement
- Data subject rights workflows designed into the platform
- Sub-processor list + right of objection in our DPA template
- DPO role designated, contactable
- DPIA support for high-risk processing engagements
- KSA entity (Annota8 AI LLC, CR 7053890286) registered for in-Kingdom procurement; MISA Entrepreneur License status pending confirmation
PDPL operationalisation is not a checkbox exercise — it requires a documented data flow, breach drills, and a privacy-by-design workflow. We’re sharing what we’ve learned designing toward this for early customers.
PDPL vs GDPR — which applies to your AI workload?
Both can apply simultaneously. Quick decision matrix:
| Data subject | Processing location | PDPL applies? | GDPR applies? |
|---|---|---|---|
| KSA resident | KSA | Yes | No |
| KSA resident | EU | Yes | Yes (Art. 3(1) establishment) |
| EU resident | KSA | Possibly (if Art. 3(2) targeting/monitoring) | Yes |
| EU resident | EU | No | Yes |
| US resident | KSA | No (PDPL); other US state laws may apply | No |
| Mixed multi-jurisdictional | Anywhere | Yes per KSA records | Yes per EU records |
GDPR Article 3(1) applies to processing in the EU regardless of the data subject’s location, so any processing established in the EU triggers GDPR.[^9] PDPL applies to processing of KSA residents’ personal data wherever the processing occurs (Article 2).[^7] For multi-jurisdiction AI training corpora, both regimes apply to their respective subjects. The annotation platform must support both.