Faiz Vadakkumpadath March 2026 8 min read

    NULL vs Access Denied: The Gap in SQL That's Silently Breaking Your Reports

    When I was building data security systems for Amazon's enterprise data lake (likely the largest data lake in the world), I encountered a problem that every major data platform still gets wrong. It's deceptively simple, but the consequences are everywhere.

    The Problem

    You run a query:

    SELECT employee_name, salary, ssn FROM employees WHERE department = 'Engineering';

    And you get back:

    employee_namesalaryssn
    Alice185000NULL
    BobNULLNULL

    Is Bob's salary actually NULL in the database, or were you denied access to it? Is Alice's SSN empty, or is a security policy hiding it from you?

    The problem is that nobody can tell.

    Every major cloud data platform - AWS Lake Formation, Snowflake, Databricks Unity Catalog, BigQuery - implements some form of column-level, row-level, or cell-level security. When a user lacks permission to see certain data, the system has to do something. And right now, every system does something different, none of them do it well, and the SQL standard has nothing to say about it.

    Three States, One Symbol

    Here's what makes this tricky. There are actually three distinct states a cell can be in:

    1. The value exists and you can see it. Normal case, no problem.

    2. There is no value. The classic database NULL. Bob genuinely has no middle name. The SSN field was never populated. This is the "missing data" that SQL's NULL was designed for.

    3. You are not allowed to know. Access Denied. The value might exist, it might not - and that's the point, you don't get to find out. Your security policy says this cell is off-limits to you.

    States 2 and 3 are fundamentally different. NULL says "there's nothing here." Access Denied says "there might be something here, but it's none of your business." The critical distinction: with NULL, you know the data is absent. With Access Denied, you don't even get to know whether data is present.

    Yet every platform represents both with the same symbol: NULL.

    This isn't a new observation. E.F. Codd, the inventor of the relational model, proposed splitting NULL into two types back in 1979 - one for "unknown" data and one for "inapplicable" data. The SQL standards committee rejected it as too complex. Even Codd's proposal didn't cover our third state. His two NULLs were both about why data is missing. Access Denied is about whether you're allowed to know if data exists at all, which is a different dimension entirely.

    What Happens Today

    When a user's query touches data they can't see, the platform has three options. None are great.

    Option 1: Fail the Query

    Reject it outright.

    ERROR: Access denied. Column 'salary' requires role 'compensation_viewer'.

    Who does this: Google BigQuery, PostgreSQL, Apache Hive with Ranger.

    Safe and honest. But a user with access to 49 out of 50 columns can't run SELECT * at all. In BigQuery, you have to write SELECT * EXCEPT (ssn, salary) -- but you need to already know which columns are restricted before writing the query. In large companies with hundreds of tables and evolving policies, this generates a constant stream of support tickets.

    Option 2: Replace With NULL (or a Sentinel)

    Return the result set, but swap in NULL or a placeholder for denied values.

    Who does this: Snowflake (masking policies), Databricks Unity Catalog (column masks), Apache Ranger (nullify/hash/partial), Oracle Data Redaction, SQL Server DDM.

    Queries don't fail, which is nice. But if you substitute with NULL, the result is semantically indistinguishable from real missing data. An analyst reporting "percentage of customers with ZIP on file" gets wrong numbers. A pipeline checking WHERE salary IS NOT NULL silently drops rows it shouldn't. The ambiguity propagates downstream through every transformation, join, and aggregation.

    One interesting exception: SQL Server's DDM masks non-NULL values with patterns (strings become XXXX, numbers become 0) but passes actual NULLs through unmasked. So you can tell NULL from masked, but this leaks whether data exists, which may itself be sensitive. And DDM has deeper problems. Microsoft's own docs warn it won't stop "exhaustive queries," and researchers have brute-forced a million SSNs in under a second.

    Option 3: Hide the Column Entirely

    Remove it from the result set schema as if it doesn't exist.

    SELECT * FROM employees;
    -- User A sees: employee_name, salary, ssn, department
    -- User B sees: employee_name, department  (salary and ssn gone)

    Who does this: AWS Lake Formation, Apache Ranger with Spark, Trino (configurable).

    Clean from a privacy standpoint. But SELECT * now returns different schemas for different users. Pipelines break when column counts don't match. Two analysts comparing results get different shapes and neither understands why. Trino's own docs note this makes queries "non-deterministic."

    A useful nuance: This works better if you only apply it to SELECT *. When someone writes SELECT salary FROM employees, they explicitly asked for that column and fail with an error. When they write SELECT *, they're exploring and it silently excludes what they can't see. Some systems draw this distinction, but most don't.

    Everyone Configures, But Nobody Standardizes

    Several platforms let administrators choose the behavior:

    • Trino: toggle between error and silent omission
    • Snowflake: masking policies are arbitrary SQL UDFs - return NULL, a string, a hash, anything
    • Ranger: multiple masking types - Nullify, Hash, Show-last-4, Custom

    But configurability without standardization just means every organization or team invents their own convention. Is NULL in this table "no data" or "masked"? Depends on who set up the policy and which system you're on.

    The Standards Gap

    The ANSI SQL standard (through SQL:2023) defines NULL as "unknown or inapplicable" with three-valued logic. It has nothing to say about access control semantics in query results.

    Same story at the data format level. Apache Arrow supports null bitmaps but no "reason codes." Parquet and ORC have no per-cell access metadata. Iceberg and Delta Lake delegate access control entirely to the query engine.

    Researchers have studied this. Wang et al. (VLDB 2007) formally proved that replacing denied values with NULL breaks query correctness. Guarnieri & Basin (VLDB 2014) went further, proving that for general SQL, no perfect algorithm exists for security-aware query processing. You can be safe, correct, and maximally informative, but not all three at once. Every system must choose what to sacrifice.

    That theoretical limit partly explains why nobody has standardized a solution. But it doesn't excuse the lack of a good enough standard. We don't need perfection, but we need a shared vocabulary.

    What We Need

    The industry needs to distinguish three cell states, not two:

    StateMeaningWhat User Learns
    ValueData is presentThe actual data
    NULLData is absent"There is nothing here"
    DENIEDAccess is restricted"You cannot know"

    This could work at multiple levels. At the SQL level, a new predicate: salary IS DENIED. At the wire protocol level, result set metadata indicating masking status per column. At the data format level, Arrow could extend its null bitmap to two bits per cell: 00=value, 01=null, 10=denied - backward compatible since unaware systems treat denied as null.

    Other fields solved this long ago. Rust has Option<T> (might be absent) vs Result<T, E> (failed, and here's why). HTTP has 204 No Content vs 403 Forbidden vs 404 Not Found. SQL has... just NULL.

    Why This Matters Now

    Four trends make this urgent:

    AI agents are querying data autonomously. A human analyst might suspect masking and investigate. An AI agent treats NULL as NULL and makes wrong decisions silently. As agentic data access grows, this ambiguity becomes a systematic source of AI errors.

    Data mesh is decentralizing governance. Fine Grain Access Control - FGAC - policies are exploding. What used to be "the DBA handles it" is now "every domain team defines their own access rules" each with different conventions for what NULL means.

    ML pipelines are training on masked data. When a model trains on data where NULL means two different things, it learns the wrong thing. A recommendation engine that treats "user declined to provide age" the same as "user's age is hidden by GDPR policy" makes systematically different errors.

    Data contracts can't express this. No data contract framework can say "this column may contain values, genuine NULLs, or access-denied markers" because the distinction doesn't exist in any type system.

    What You Can Do Today

    Until standards catch up:

    • Fail explicit requests, omit from SELECT *. If someone asks for salary by name, tell them they can't have it. If they write SELECT *, quietly exclude it.
    • Never substitute NULL silently. If you must mask, use a non-NULL sentinel like '[REDACTED]' so consumers know it's not real data.
    • Make it configurable per-policy. Not all columns are equally sensitive.
    • Log every denial. Even if the user sees no error, record that data was withheld.

    I Want to Hear From You

    This affects every organization with non-trivial access controls, which, in the era of GDPR, CCPA, and HIPAA, is nearly everyone.

    I strongly believe that the gap between "no data" and "none of your business" is one the industry needs to close. The data engineering community should get together and create a proposal to add support for an equivalent of DBNULL but for DENIED, and then work towards integrating that to SQL standards, storage formats and compute engines.

    If you have any thoughts on this or can help with the proposal, email me faiz@getnile.ai. If there's interest, I will put together a conversation or webinar on this topic.

    Further Reading

    Building the Future of Data Security

    Nile is rethinking how data platforms handle versioning, access control, and pipeline safety. See what's possible.

    Try the Live Demo