All posts

Telco DPI labeling in the Middle East: balancing privacy with operations

Why this topic, now

Every Middle Eastern operator I work with now runs a DPI platform of some kind — Sandvine, Allot, or a sovereign in-house build. The reason is purely operational: 5G plus converged fixed networks plus encrypted entertainment traffic make port-based classification operationally worthless. To know whether a subscriber is consuming Netflix versus cloud gaming versus WhatsApp calls, you need application-layer inference. That is modern DPI.

But the legal landscape has shifted materially in the past two years. Saudi PDPL came into full enforcement in September 2024,1 UAE PDPL has been in force since January 2022, with the UAE Data Office and executive regulations operationalising through 2023–2024,2 and Egypt’s Personal Data Protection Law 151/2020 finally received its Executive Regulations in November 2025 (Ministerial Decree 816/2025), with a one-year compliance grace period running through late 2026.3 The Data Protection Centre is the supervisory authority under the Egyptian PDPL; NTRA continues to regulate the telecom sector and is increasingly coordinating on cross-border data flow notifications. DPI data that was collected and labeled freely three years ago now sits under two regulatory layers: horizontal data protection law, and sectoral telecom regulation. The data leads at telcos keep asking me the same question in every meeting: “What can we actually label without crossing a line?”

I wrote this to draw the line I use with Annota8’s telco clients. It is not legal advice, and any labeling vendor who claims they can give you a final legal answer without licensed local counsel is lying to you. But the operational line is known, and this is how I read it.

Track one: three DPI paths, not one

The biggest mistake I see in newly-formed data teams is treating DPI as a single black box. Operationally, there are three radically different tracks, each with its own legal basis:

Track one — operational machine learning. Goal: classify flows, detect patterns, optimize QoS, predict faults. Basis: legitimate interest plus network-operation necessity. This is what telco clients mean when they ask us to label DPI data, and it is the domain where you can work safely under minimization rules.

Track two — lawful intercept. Goal: execute judicial orders or security requests. Basis: explicit statutory authority plus a specific court order. This track never enters a commercial labeling pipeline. It is handled by specialized teams inside the operator, under compliance-unit supervision, in environments isolated from regular MLOps platforms. If you receive a request from a telco client to label intercept data, refuse. That is not our work.

Track three — commercial policy enforcement. Goal: traffic shaping, behavioral advertising, dynamic plan pricing. This is where the strictest requirements live — explicit consent in most jurisdictions, prior disclosure, and a documented opt-out mechanism. Many regional telcos blend this track with track one inside their own systems, which inflates the legal surface area for no good reason.

As a rule, we require the client to give us written confirmation that the data they send belongs exclusively to track one. That protects both sides.

This list is the operational heart of a clean DPI pipeline:

Flow records. The header 5-tuple: source/destination IP, source/destination port, protocol. Plus timestamps and packet sizes. This is metadata par excellence — it does not expose content. Label freely to train flow-classification and anomaly-detection models.

Application-layer protocol patterns. Statistical signatures of HTTPS, QUIC, STUN, DNS-over-HTTPS, WhatsApp, Netflix, and the rest. This is where modern Encrypted Traffic Classification lives — inferring application type without decrypting payloads. You label by class (TikTok vs YouTube vs Zoom), without content and without user identity.

Quality-of-service and performance measurements. Latency, jitter, packet loss, effective throughput. Label these with geographic binding at the cell or area level — not the individual-subscriber level.

Aggregated consumption patterns. Average usage in a given cell between 8 and 10 PM. This is aggregation — no individual is targeted.

All of the above can be labeled confidently under three conditions:

  1. Strip subscriber identifiers before data leaves the operator’s environment — no IMSI, no MSISDN, no public IP tied to an account.
  2. Geographic aggregation with a k-anonymity floor of k = 50 at the single-cell level. (This is our operational floor — set conservatively against the re-identification literature, not a regulator-mandated threshold under CST, TDRA, or NTRA.)
  3. A documented Data Protection Impact Assessment (DPIA) built into the workflow, not bolted on later.

What never gets labeled in a commercial pipeline

Payload content. Message text, call audio, streamed images. Even if the payload is unencrypted (rare today), labeling it commercially exceeds the legitimate-interest basis in every Gulf and Egyptian jurisdiction. This is lawful-intercept territory — not operational ML.

PII extraction. Phone numbers from payloads, email addresses, personal names, card numbers. Even if these appear by accident in a DNS query or HTTP header, they must be redacted before labeling.

Voice biometrics without explicit consent. This is where many regional telco teams get confused. Labeling a subscriber voiceprint for “identity verification” or “fraud detection” needs explicit written consent. Under KSA PDPL, biometric data is expressly categorised as sensitive personal data;4 UAE PDPL also classifies biometric data as sensitive, though the law text does not yet impose distinct heightened processing requirements over and above the general sensitive-data regime.5 Operationally, treat voiceprints as sensitive in both jurisdictions. Do not label call-center voice data without confirming the subscriber consented to recording and to the analytic use, and that consent specifically covers derivative uses (model training, IVR optimization, fraud detection). This is a detail that has wrecked many programs.

Linking subscriber identity to DPI data for advertising. This is a company-policy call, but the operational rule is: do not put it in the labeling pipeline. Keep it in a fully isolated track, with explicit consent, in a segregated database.

The sectoral regulatory layer

PDPL is horizontal — it applies to every sector. Telecoms get an extra layer:

Saudi Arabia — CST (Communications, Space & Technology Commission). Renamed from CITC in November 2022 by Cabinet Decree No. 235 of 1444H, which also transferred space regulatory competencies to the commission — a point that gets garbled in older references.6 CST imposes additional requirements on operators around data retention and incident reporting. Any labeling pipeline must align with CST’s telecom information-security policies, on top of PDPL.

UAE — TDRA (Telecommunications and Digital Government Regulatory Authority). Has issued cybersecurity guidance for telecoms specifying critical data categories. Labeling that involves data extraction from the network falls under TDRA Standards for Information Assurance, and any cross-border labeling pipeline needs an additional impact assessment.

Egypt — NTRA (National Telecom Regulatory Authority). Under Law 151/2020 (administered by the Data Protection Centre under the Executive Regulations issued via Decree 816/2025), cross-border transfers of personal data require an adequacy assessment and approval; for telecom operators NTRA notification typically applies in parallel under the sectoral licensing regime. Operationally that means moving DPI data from an Egyptian operator to a labeling provider in a third country requires both Data Protection Centre approval and NTRA-side documentation of the flow.

Gulf and Egyptian clients often ask whether we can run the work from Cairo on their data. The practical answer: yes, but only with a signed Data Transfer Agreement that includes minimization clauses, deletion mechanisms, and Annota8’s disclosure as a processor on behalf. I prefer working inside the client’s environment (on-premise or a dedicated VPC) whenever possible — no data transfer at all.

What this means for pipeline design

The operational translation for ops teams:

1. Separate at source. We require the client to separate DPI metadata from payload data before anything reaches the labeling platform. If payloads arrive on our platform by mistake, we refuse to process and report.

2. Purpose-limited labeling schema. Every labeling job in a DPI pipeline carries an explicit purpose tag — flow classification, fraud detection, QoS optimization. Reusing the same labels for other purposes requires a signed contract amendment.

3. K-anonymity floor on labeler outputs. Before a human labeler sees a batch, it passes through an aggregation gate that ensures no attribute is re-identifiable.

4. Labeler training on disclosure escalation. If a labeler sees a personal identifier in a batch — phone number, email, name — the batch is paused immediately and reported. This is foundational training for any team working on telco data.

5. Periodic purpose review. Every six months we re-confirm with the client: is the original purpose still valid? Have new uses been added? Has regulation moved? Labeling that was legitimate in 2024 may be questionable in 2026.

Where my responsibility ends and counsel’s begins

I write this as the CEO of Annota8, working day-to-day with telco data teams. The above is the practical reading I use when designing labeling pipelines. But the final interpretation of PDPL, or what CST requires in a specific case, needs licensed local counsel. If you are a chief data officer at a telco reading this, the sensible next step is a written DPIA for every DPI labeling pipeline you run, signed off by in-house legal. That is not a paper you can defer — it is a precondition for operational continuity.

From our side at Annota8, we enter the conversation with a written commitment to minimization, labeler training, and purpose documentation. We do not take projects that involve payloads or identity linkage — even when the budget is attractive. This is not moral hygiene, it is long-horizon risk math. One contract where a vendor is found to have crossed the minimization line is enough to end the relationship with the operator and the regulator simultaneously.

Connecting to the rest of the stack

To extend this reading:

A closing note: regional telcos have huge capacity to build strong operational models on DPI metadata. The gap is not capability, it is discipline — drawing the line yourself before the regulator draws it for you. The operators who draw it themselves today will compete from a position of strength in 2027, not from a defensive crouch.

Talk to our DPI team Read our DPI methodology

References


Annota8 is in early-stage operations and does not hold formal compliance certifications. Statements about regulatory approach reflect internal design intent, not certified status. Engage qualified local counsel and advisors for any active procurement or regulatory decision.

Footnotes

  1. Clyde & Co, “Saudi Arabia’s Personal Data Protection Law becomes fully enforceable” (Sept 2024). https://www.clydeco.com/en/insights/2024/09/saudi-arabia-s-personal-data-protection-law-become

  2. UAE Government Portal, “Data protection laws in the UAE” — Federal Decree-Law 45/2021 in force since 2 January 2022, with the UAE Data Office and Executive Regulations operationalising through 2023–2024. https://u.ae/en/about-the-uae/digital-uae/data/data-protection-laws

  3. Baker McKenzie, “Egypt: important data protection update” (Jan 2026) — Executive Regulations to Law 151/2020 issued via Ministerial Decree 816/2025 on 1 November 2025, entering force 2 November 2025 with a one-year grace period. https://www.bakermckenzie.com/en/insight/publications/2026/01/egypt-important-data-protection-update ; CMS Law, “Egypt’s PDPL Executive Regulations issued: one-year compliance countdown begins.” https://cms.law/en/are/legal-updates/egypt-s-pdpl-executive-regulations-issued-one-year-compliance-countdown-begins

  4. KSA PDPL Article 1 — biometric data is expressly categorised as sensitive personal data. CookieYes summary of the Saudi Arabia Personal Data Protection Law. https://www.cookieyes.com/blog/saudi-arabia-personal-data-protection-law/

  5. Global Legal Group, “Data Protection — United Arab Emirates”: biometric data is included in the UAE PDPL definition of sensitive personal data, but “there are no specific requirements, restrictions or heightened levels of protection set out in the law in respect of biometric data.” https://www.globallegalpost.com/lawoverborders/data-protection-law-guide-1072382791/united-arab-emirates-1827256483

  6. Communications, Space & Technology Commission (CST), “Who We Are” — CITC renamed to CST by Cabinet Decree No. 235 of 07/04/1444H (10 November 2022), with transfer of space regulatory competencies. https://www.cst.gov.sa/en/about/who-we-are