---
name: ckg-aws-data-catalog
description: "Amazon Web Services data catalog & governance stack — complete CKG of the live AWS documentation surface (AWS Glue Data Catalog, Glue crawlers/classifiers, AWS Glue Data Quality/DQDL, AWS Glue Schema Registry, AWS Lake Formation, Amazon DataZone / SageMaker Catalog / SageMaker Unified Studio, OpenLineage data lineage, Amazon S3 Tables, and open table formats). 268 nodes, 387 edges, 9 domains. AWS splits the catalog across Glue (technical) + Lake Formation (governance) + DataZone (business)."
metadata:
  node_type: reference
  type: reference
  version: 1.0.0
  date: 2026-06-18
  source: "AWS documentation — docs.aws.amazon.com (glue, lake-formation, datazone, sagemaker-unified-studio, AmazonS3/s3-tables)"
  nodes: 268
  edges: 387
  domains: 9
  formats:
    - md
---

# AWS Data Catalog & Governance Stack — Compressed Knowledge Graph (CKG) v1.0.0
# Source: docs.aws.amazon.com (glue · lake-formation · datazone · sagemaker-unified-studio · AmazonS3/s3-tables) — live docs, fetched 2026-06-18
# NOTE: AWS has NO single unified catalog. The 'knowledge catalog' equivalent is split across three planes:
#   Glue Data Catalog (technical metastore) + Lake Formation (fine-grained governance) + DataZone/SageMaker Catalog (business catalog).
# NOTE: Amazon DataZone is being rebranded / folded into 'Amazon SageMaker Catalog' under SageMaker Unified Studio (domains upgrade in place).
# Generated by Graphify.md | graphifymd.com
# 268 concepts · 387 dependency edges · 9 domains
# Paste into any LLM, ask: "What depends on [concept]?" or "Trace the path from Amazon S3 bucket to Cell-level security."

## META
domain:      aws-data-catalog-governance
nodes:       268
edges:       387
domains:     9 (CORE · META · SRCH · DQ · LIN · GOV · GLOS · OPEN · PROC)
edge_type:   technical dependency (source concept required to understand/implement the target)
version:     1.0.0

## CORE — Platform & Data Organization (26 concepts)
# Source: glue/catalog-and-crawler · lake-formation/what-is-lake-formation · datazone/datazone-concepts · sagemaker-unified-studio/data-governance · s3/s3-tables
AWS account                                                       deps: none (root)
Amazon S3 data lake                                               deps: AWS account
Amazon S3 bucket                                                  deps: Amazon S3 data lake
AWS analytics & ML services                                       deps: AWS account
Amazon Athena                                                     deps: AWS analytics & ML services
Amazon EMR                                                        deps: AWS analytics & ML services
Amazon Redshift                                                   deps: AWS analytics & ML services
Amazon Redshift Spectrum                                          deps: Amazon Redshift
AWS Glue (service)                                                deps: AWS analytics & ML services
AWS Glue Data Catalog                                             deps: AWS Glue (service) | Amazon S3 data lake
Central technical metastore                                       deps: AWS Glue Data Catalog
Catalog resource (multi-level federated catalog)                  deps: AWS Glue Data Catalog
Catalog ID (CatalogId)                                            deps: AWS Glue Data Catalog | AWS account
Default Data Catalog                                              deps: AWS Glue Data Catalog | Catalog resource (multi-level federated catalog)
AWS Lake Formation                                                deps: AWS Glue Data Catalog
Amazon DataZone                                                   deps: AWS analytics & ML services
Amazon SageMaker Catalog (DataZone rebrand)                       deps: Amazon DataZone
Amazon SageMaker Unified Studio                                   deps: Amazon SageMaker Catalog (DataZone rebrand)
Data mesh                                                         deps: AWS Glue Data Catalog | Amazon DataZone
Three-layer catalog split (Glue + Lake Formation + DataZone)      deps: AWS Glue Data Catalog | AWS Lake Formation | Amazon DataZone
Business data catalog                                             deps: Amazon DataZone
Data portal                                                       deps: Amazon DataZone | Business data catalog
Self-service data access                                          deps: Data portal | Publish-subscribe workflow
Data producer                                                     deps: Data mesh
Data consumer                                                     deps: Data mesh
Producer-consumer separation                                      deps: Data producer | Data consumer

## META — Catalog Metadata Model (35 concepts)
# Source: glue/tables-described · manage-catalog · aws-glue-api-catalog-partitions · schema-registry · connection-properties · column-statistics
Database                                                          deps: Central technical metastore
Table (metadata table)                                            deps: Database
Column                                                            deps: Table (metadata table)
Schema (table definition)                                         deps: Table (metadata table) | Column
Nested column / struct                                            deps: Column
Table version                                                     deps: Table (metadata table)
Partition                                                         deps: Table (metadata table) | Amazon S3 bucket
Partition key                                                     deps: Partition | Schema (table definition)
Partition index                                                   deps: Partition | Partition key
SerDe (serialization library)                                     deps: Table (metadata table)
Table properties / parameters                                     deps: Table (metadata table)
Table location (Amazon S3 path)                                   deps: Table (metadata table) | Amazon S3 bucket
AWS Glue Data Catalog view                                        deps: Table (metadata table) | AWS Lake Formation
Resource link                                                     deps: Database | Table (metadata table)
Connection (connection definition)                                deps: Central technical metastore
JDBC connection                                                   deps: Connection (connection definition)
NETWORK connection                                                deps: Connection (connection definition)
MONGODB connection                                                deps: Connection (connection definition)
KAFKA connection                                                  deps: Connection (connection definition)
MARKETPLACE / CUSTOM connector connection                         deps: Connection (connection definition)
Column statistics                                                 deps: Column | Table (metadata table)
Materialized view (Iceberg)                                       deps: Table (metadata table) | Open table format support
AWS Glue Schema Registry                                          deps: AWS Glue (service)
Registry                                                          deps: AWS Glue Schema Registry
Schema (stream schema)                                            deps: Registry
Schema version                                                    deps: Schema (stream schema)
Schema definition                                                 deps: Schema version
Data format (Avro / JSON Schema / Protobuf)                       deps: Schema (stream schema)
Compatibility mode                                                deps: Schema version
Schema checkpoint version                                         deps: Compatibility mode | Schema version
Serializer / deserializer (SerDe libraries)                       deps: Schema version | Data format (Avro / JSON Schema / Protobuf)
Schema evolution (streams)                                        deps: Schema version | Compatibility mode
Streaming integration (Kafka/MSK/Kinesis/Flink/Lambda)            deps: Serializer / deserializer (SerDe libraries) | KAFKA connection
Federation to external data sources                               deps: Catalog resource (multi-level federated catalog) | Connection (connection definition)
Amazon Redshift managed catalog (RMS)                             deps: Catalog resource (multi-level federated catalog) | Amazon Redshift

## SRCH — Crawlers, Classifiers & Discovery (25 concepts)
# Source: glue/catalog-and-crawler · add-crawler · add-classifier · custom-classifier · datazone/working-with-business-catalog (search/browse/subscribe)
Crawler                                                           deps: AWS Glue Data Catalog
Crawler data store / data source                                  deps: Crawler | Amazon S3 bucket
Automatic discovery                                               deps: Crawler
Schema inference                                                  deps: Crawler | Schema (table definition)
Include / exclude patterns                                        deps: Crawler data store / data source
Crawler schedule                                                  deps: Crawler
Incremental crawl                                                 deps: Crawler | Crawler schedule
Recrawl policy                                                    deps: Crawler | Incremental crawl
Schema change policy (SchemaChangePolicy)                         deps: Crawler | Schema (table definition)
Partition detection (crawler)                                     deps: Crawler | Partition
Classifier                                                        deps: Crawler
Built-in classifier                                               deps: Classifier
Custom classifier                                                 deps: Classifier
Classifier certainty                                              deps: Classifier
Classifier order / priority                                       deps: Classifier | Custom classifier
Grok classifier                                                   deps: Custom classifier
XML classifier                                                    deps: Custom classifier
JSON path classifier                                              deps: Custom classifier
CSV classifier                                                    deps: Built-in classifier
Classification string                                             deps: Classifier | Schema inference
Hudi / Iceberg crawler support                                    deps: Crawler | Open table format support
DataZone search & browse                                          deps: Business data catalog | Published asset
Semantic search (Glue Catalog, Preview)                           deps: AWS Glue Data Catalog | Business context enrichment
Glue Search API                                                   deps: Semantic search (Glue Catalog, Preview)
Subscribe (request access)                                        deps: DataZone search & browse | Subscription request

## DQ — Data Quality (AWS Glue Data Quality) (47 concepts)
# Source: glue/glue-data-quality · dqdl · dqdl-rule-types · data-quality-anomaly-detection · data-quality-using-apis
AWS Glue Data Quality                                             deps: AWS Glue (service)
DeeQu framework                                                   deps: AWS Glue Data Quality
Data quality for the Data Catalog (entry point)                   deps: AWS Glue Data Quality | Table (metadata table)
Data quality for ETL jobs (entry point)                           deps: AWS Glue Data Quality | AWS Glue ETL job
Data Quality Definition Language (DQDL)                           deps: AWS Glue Data Quality
Ruleset                                                           deps: Data Quality Definition Language (DQDL) | Table (metadata table)
Rule                                                              deps: Data Quality Definition Language (DQDL)
Rule type                                                         deps: Rule
Rule parameters                                                   deps: Rule type
Expression                                                        deps: Rule
Composite rule (and/or/not)                                       deps: Rule | Expression
compositeRuleEvaluation.method (ROW/COLUMN)                       deps: Composite rule (and/or/not)
Where clause filter                                               deps: Rule
Threshold (with threshold)                                        deps: Expression
NULL / EMPTY / WHITESPACES_ONLY keywords                          deps: Expression
Constants                                                         deps: Data Quality Definition Language (DQDL)
Labels                                                            deps: Rule
Completeness rule                                                 deps: Rule type
ColumnValues rule                                                 deps: Rule type
ColumnCount rule                                                  deps: Rule type
ColumnDataType rule                                               deps: Rule type
ColumnLength rule                                                 deps: Rule type
ColumnCorrelation rule                                            deps: Rule type
RowCount rule                                                     deps: Rule type
Uniqueness rule                                                   deps: Rule type
IsComplete rule                                                   deps: Rule type
IsUnique rule                                                     deps: Rule type
IsPrimaryKey rule                                                 deps: Rule type
DataFreshness rule                                                deps: Rule type
ReferentialIntegrity rule                                         deps: Rule type
SchemaMatch rule                                                  deps: Rule type
DatasetMatch / AggregateMatch / RowCountMatch rules               deps: Rule type
CustomSQL rule                                                    deps: Rule type
FileFreshness / FileSize / FileUniqueness / FileMatch rules       deps: Rule type
DetectAnomalies rule                                              deps: Rule type | Anomaly detection (ML)
Analyzer                                                          deps: Data Quality Definition Language (DQDL)
AllStatistics analyzer                                            deps: Analyzer
Statistics                                                        deps: Analyzer | Rule
Observation                                                       deps: Statistics
Anomaly detection (ML)                                            deps: AWS Glue Data Quality | Statistics
Dynamic rule (last() operator)                                    deps: Rule | Statistics
Rule recommendations                                              deps: AWS Glue Data Quality | Statistics
Data quality score                                                deps: Ruleset | Rule
Data quality results                                              deps: Ruleset | Data quality score
Row-level error record identification                             deps: Data quality results | Data quality for ETL jobs (entry point)
Write DQ results to Amazon S3                                     deps: Data quality results | Amazon S3 bucket
EventBridge / CloudWatch DQ integration                           deps: Data quality results

## LIN — Data Lineage (DataZone / OpenLineage) (19 concepts)
# Source: datazone/datazone-data-lineage · datazone-data-lineage-what-is-openlineage · glue catalog data lineage
DataZone data lineage                                             deps: Business data catalog
OpenLineage compatibility                                         deps: DataZone data lineage
Lineage node                                                      deps: DataZone data lineage
Dataset node                                                      deps: Lineage node
Job (run) node                                                    deps: Lineage node
RunEvent / JobEvent / DatasetEvent                                deps: OpenLineage compatibility
Lineage event                                                     deps: OpenLineage compatibility | RunEvent / JobEvent / DatasetEvent
sourceIdentifier                                                  deps: Lineage node
Lineage facet / form type                                         deps: Lineage event | Metadata form
Upstream / downstream traversal                                   deps: Dataset node | Job (run) node
Column-level lineage                                              deps: Dataset node | Column
Lineage versioning (history)                                      deps: Lineage node | Lineage event
PostLineageEvent API                                              deps: Lineage event
GetLineageNode / ListLineageNodeHistory API                       deps: Lineage node
Automated lineage from Glue catalog                               deps: DataZone data lineage | Data source run
Automated lineage from Amazon Redshift                            deps: DataZone data lineage | Amazon Redshift
OpenLineage Spark listener (Glue 5.0)                             deps: OpenLineage compatibility | AWS Glue ETL job
Lineage capture in blueprint config                               deps: Automated lineage from Glue catalog | Environment blueprint
Glue Data Catalog lineage record                                  deps: AWS Glue Data Catalog | DataZone data lineage

## GOV — Governance, Security & Access Control (Lake Formation + IAM + KMS) (46 concepts)
# Source: lake-formation/what-is-lake-formation · TBAC-overview · lf-permissions-reference · data-filtering · managing-tags · catalog-encryption · cross-account-permissions
Lake Formation permissions model                                  deps: AWS Lake Formation
Data lake administrator                                           deps: AWS Lake Formation
Lake Formation principal                                          deps: Lake Formation permissions model
Data location registration                                        deps: AWS Lake Formation | Amazon S3 bucket
DATA_LOCATION_ACCESS permission                                   deps: Data location registration | Lake Formation permissions model
Named-resource permission method                                  deps: Lake Formation permissions model | Database | Table (metadata table)
Data lake permission (SELECT/INSERT/ALTER/DROP/DELETE/DESCRIBE)   deps: Named-resource permission method
CREATE_DATABASE / CREATE_TABLE permission                         deps: Named-resource permission method | Database
Super permission (ALL)                                            deps: Data lake permission (SELECT/INSERT/ALTER/DROP/DELETE/DESCRIBE)
Super user permission (catalog)                                   deps: Super permission (ALL) | Catalog resource (multi-level federated catalog)
Grantable permission (grant option)                               deps: Data lake permission (SELECT/INSERT/ALTER/DROP/DELETE/DESCRIBE)
Implicit permission (DESCRIBE)                                    deps: Data lake permission (SELECT/INSERT/ALTER/DROP/DELETE/DESCRIBE)
IAMAllowedPrincipals group                                        deps: Lake Formation permissions model
ALLIAMPrincipals group                                            deps: Lake Formation permissions model
Hybrid access mode                                                deps: Lake Formation permissions model | IAMAllowedPrincipals group
LF-Tag (key-value pair)                                           deps: AWS Lake Formation
LF-Tag definition                                                 deps: LF-Tag (key-value pair)
Tag-based access control (LF-TBAC)                                deps: LF-Tag (key-value pair) | Lake Formation permissions model
LF-Tag assignment to resources                                    deps: LF-Tag (key-value pair) | Database | Table (metadata table) | Column
LF-Tag expression / policy                                        deps: Tag-based access control (LF-TBAC) | LF-Tag assignment to resources
LF-Tag inheritance                                                deps: LF-Tag assignment to resources
ASSOCIATE permission (LF-Tag)                                     deps: LF-Tag (key-value pair) | Tag-based access control (LF-TBAC)
CREATE_LF_TAG / ALTER / DROP LF-Tag permission                    deps: LF-Tag (key-value pair)
Attribute-based access control (ABAC, IAM tags)                   deps: Lake Formation permissions model
Data filter (DataCellsFilter)                                     deps: Named-resource permission method | Table (metadata table)
Column-level security (column filtering)                          deps: Data filter (DataCellsFilter) | Column
Row-level security (row filtering)                                deps: Data filter (DataCellsFilter)
Cell-level security                                               deps: Column-level security (column filtering) | Row-level security (row filtering)
Row filter expression (PartiQL WHERE)                             deps: Row-level security (row filtering)
Column include / exclude list                                     deps: Column-level security (column filtering)
AllRowsWildcard / all-columns wildcard                            deps: Data filter (DataCellsFilter)
CreateDataCellsFilter API                                         deps: Data filter (DataCellsFilter)
Cross-account data sharing                                        deps: Lake Formation permissions model | Tag-based access control (LF-TBAC)
Cross-account version 3 settings                                  deps: Cross-account data sharing
Share to organization / OU                                        deps: Cross-account data sharing
Credential vending (integrated engine access)                     deps: Data filter (DataCellsFilter) | AWS analytics & ML services
AWS IAM                                                           deps: AWS account
IAM role / user (principal)                                       deps: AWS IAM
IAM policy (coarse-grained)                                       deps: AWS IAM
IAM Identity Center principal                                     deps: AWS IAM
Resource policy (Data Catalog)                                    deps: AWS Glue Data Catalog | AWS IAM
AWS KMS                                                           deps: AWS account
Data Catalog encryption (KMS)                                     deps: AWS KMS | Central technical metastore
CloudTrail audit logging                                          deps: AWS Lake Formation | AWS IAM
Blueprint (Lake Formation ingestion)                              deps: AWS Lake Formation | Crawler
Lake Formation workflow                                           deps: Blueprint (Lake Formation ingestion)

## GLOS — Business Catalog, Glossary, Metadata Forms & Subscriptions (DataZone) (41 concepts)
# Source: datazone/datazone-concepts · working-with-business-catalog · create-metadata-form · sagemaker-unified-studio/data-governance
Domain                                                            deps: Amazon DataZone
Domain unit                                                       deps: Domain
Root domain unit                                                  deps: Domain unit
Associated account                                                deps: Domain | AWS account
Authorization policy                                              deps: Domain unit
Project                                                           deps: Domain
Project membership (owner/contributor/consumer/steward/viewer)    deps: Project
Environment                                                       deps: Project
Environment profile                                               deps: Environment
Environment blueprint                                             deps: Environment profile
Data lake blueprint                                               deps: Environment blueprint | AWS Lake Formation
Data warehouse blueprint                                          deps: Environment blueprint | Amazon Redshift
SageMaker blueprint                                               deps: Environment blueprint | Amazon SageMaker Unified Studio
Data source                                                       deps: Project | AWS Glue Data Catalog
Data source run                                                   deps: Data source
Inventory asset                                                   deps: Project | Data source run
Asset                                                             deps: Inventory asset
Asset type                                                        deps: Asset
System asset type (Glue/Redshift/S3)                              deps: Asset type
Custom asset type                                                 deps: Asset type
Asset version / revision                                          deps: Asset
Published asset                                                   deps: Asset | Business data catalog
Listing (published asset)                                         deps: Published asset
Publishing workflow                                               deps: Inventory asset | Published asset
Metadata form                                                     deps: Asset | Domain
Metadata form type                                                deps: Metadata form
Metadata form field                                               deps: Metadata form
Business glossary                                                 deps: Domain
Glossary term                                                     deps: Business glossary
Term-to-asset / term-to-column assignment                         deps: Glossary term | Asset | Column
Business context enrichment                                       deps: AWS Glue Data Catalog | Business glossary | Metadata form
Data product                                                      deps: Asset | Data mesh
Subscription request                                              deps: Published asset | Project
Subscription approval (approve/reject/revoke/grant)               deps: Subscription request
Subscription grant                                                deps: Subscription approval (approve/reject/revoke/grant)
Subscription target                                               deps: Subscription grant | Environment
Subscription fulfillment workflow                                 deps: Subscription grant | AWS Lake Formation | Amazon Redshift
Managed vs unmanaged asset                                        deps: Subscription fulfillment workflow
Publish-subscribe workflow                                        deps: Publishing workflow | Subscription request
User / group profile                                              deps: Domain
DataZone to SageMaker upgrade                                     deps: Amazon DataZone | Amazon SageMaker Catalog (DataZone rebrand)

## OPEN — Open Table Formats, S3 Tables & Engine Integration (20 concepts)
# Source: glue native OTF support (Iceberg/Hudi/Delta) · s3/s3-tables · table-optimizers · create-s3-tables-catalog · iceberg REST
Open table format support                                         deps: AWS Glue Data Catalog
Apache Iceberg (on Glue)                                          deps: Open table format support
Apache Hudi (on Glue)                                             deps: Open table format support
Delta Lake (on Glue)                                              deps: Open table format support
Transactional data lake table                                     deps: Apache Iceberg (on Glue) | Apache Hudi (on Glue) | Delta Lake (on Glue)
--datalake-formats job parameter                                  deps: Transactional data lake table | AWS Glue ETL job
Iceberg managed compaction / table optimization                   deps: Apache Iceberg (on Glue) | Table (metadata table)
Amazon S3 Tables                                                  deps: Amazon S3 data lake | Apache Iceberg (on Glue)
Table bucket                                                      deps: Amazon S3 Tables
Table namespace                                                   deps: Table bucket
S3 Tables table (managed Iceberg)                                 deps: Table bucket | Table namespace
S3 Tables maintenance (compaction/snapshot/unref removal)         deps: S3 Tables table (managed Iceberg)
s3tables service namespace                                        deps: Amazon S3 Tables
S3 Tables integration with Glue Data Catalog                      deps: Amazon S3 Tables | AWS Glue Data Catalog | AWS Lake Formation
Apache Iceberg V3                                                 deps: Amazon S3 Tables | Apache Iceberg (on Glue)
Iceberg REST catalog endpoint                                     deps: AWS Glue Data Catalog | Apache Iceberg (on Glue)
Athena query on cataloged tables                                  deps: Amazon Athena | Table (metadata table)
EMR access to Data Catalog                                        deps: Amazon EMR | AWS Glue Data Catalog
Redshift Spectrum on external tables                              deps: Amazon Redshift Spectrum | Table (metadata table)
Engine fine-grained access enforcement                            deps: Credential vending (integrated engine access) | Athena query on cataloged tables | EMR access to Data Catalog

## PROC — Glue Compute (ETL Jobs & Crawler Runtime) (9 concepts)
# Source: glue ETL jobs · crawlers compute · update-from-job
AWS Glue ETL job                                                  deps: AWS Glue (service)
AWS Glue Spark engine                                             deps: AWS Glue ETL job
AWS Glue version (3.0/4.0/5.0/5.1)                                deps: AWS Glue Spark engine
DynamicFrame                                                      deps: AWS Glue ETL job
Job bookmark                                                      deps: AWS Glue ETL job
AWS Glue Studio (visual ETL)                                      deps: AWS Glue ETL job
Schema/partition update from ETL                                  deps: AWS Glue ETL job | Schema (table definition) | Partition
Crawler runtime                                                   deps: Crawler | AWS Glue (service)
EvaluateDataQuality transform                                     deps: AWS Glue ETL job | Data quality for ETL jobs (entry point)

---

## APPENDIX A — KEY API RESOURCES

- **CORE** — Multi-plane: glue:GetDatabases/GetTables (Data Catalog); lakeformation:* (governance); datazone:* (business catalog). Catalog resources: Catalog (multi-level/federated), Database, Table. CatalogId defaults to AWS account ID. SageMaker Unified Studio wraps DataZone domains + Glue/Lake Formation.
- **META** — Glue Data Catalog: Database -> Table -> Column/Partition (+ PartitionIndex, ColumnStatistics, TableVersion, Connection). Schema Registry: Registry -> Schema -> SchemaVersion (Avro/JSON Schema/Protobuf; 8 compatibility modes NONE/DISABLED/BACKWARD[_ALL]/FORWARD[_ALL]/FULL[_ALL]). Limits: 100 registries, 10,000 schema versions/region, 170KB payload.
- **SRCH** — Crawler -> Classifier (custom Grok/XML/JSON-path first, then built-in; certainty 0.0-1.0; classification string e.g. 'json','csv','parquet'). RecrawlPolicy + SchemaChangePolicy. DataZone search/browse + Subscribe. Glue Search API + semantic search (Preview).
- **DQ** — AWS Glue Data Quality (DeeQu) -> Ruleset (<=2,000 rules, <=65KB) -> Rule (25+ rule types) + Analyzer -> Statistics -> Observation. DQDL: composite rules (and/or/not, ROW vs COLUMN), where clause, threshold, dynamic rules (last()), Labels, Constants. Two entry points: Data Catalog (recommendations, no anomaly/dynamic) vs ETL (anomaly detection, dynamic rules, row-level results). Score = % rules passing.
- **LIN** — DataZone data lineage (OpenLineage). PostLineageEvent (write) / GetLineageNode / ListLineageNodeHistory (read). Nodes: Dataset node + Job(run) node; sourceIdentifier enforces uniqueness. RunEvent/JobEvent/DatasetEvent + facets. Auto-captured from Glue catalog + Redshift + Glue 5.0 Spark (OpenLineageSparkListener -> amazon_datazone_api transport). Column-level lineage + versioned history.
- **GOV** — Lake Formation permissions per resource: Catalog(ALL/ALTER/CREATE_DATABASE/DESCRIBE/DROP/SUPER_USER), Database(CREATE_TABLE/...), Table(SELECT/INSERT/DELETE/ALTER/DROP/DESCRIBE), S3 location(DATA_LOCATION_ACCESS), LF-Tag(ASSOCIATE/DESCRIBE/CREATE_LF_TAG/ALTER/DROP). CreateDataCellsFilter = column+row+cell security (PartiQL row filter, include/exclude columns, AllRowsWildcard). IAMAllowedPrincipals + hybrid access mode. Cross-account v3 (orgs/OUs). KMS catalog encryption; CloudTrail audit.
- **GLOS** — DataZone: Domain -> DomainUnit -> Project -> Environment (via EnvironmentProfile + Blueprint: Data lake / Data warehouse / SageMaker). DataSource -> DataSourceRun -> Inventory Asset -> Publish -> Listing. AssetType (system Glue/Redshift/S3 + custom), MetadataForm(+FormType+Field), BusinessGlossary -> Term. SubscriptionRequest -> approve -> SubscriptionGrant -> SubscriptionTarget -> fulfillment (LF/Redshift grants). System metadata form types: glue-table-form-type, column-business-metadata-form-type, etc.
- **OPEN** — Glue native OTF (no connector): Iceberg/Hudi/Delta via --datalake-formats (Glue 3.0/4.0/5.0/5.1; libs Hudi 1.0.2, Iceberg 1.10.0, Delta 3.3.2 in 5.1). Amazon S3 Tables = managed Iceberg: Table bucket -> Namespace -> Table; s3tables namespace; auto maintenance (compaction/snapshot/unref removal); Iceberg V3; integrates with Glue Data Catalog + Lake Formation; Iceberg REST endpoint. Engines: Athena/EMR/Redshift Spectrum enforce LF column/row filtering.
- **PROC** — AWS Glue ETL job (Spark; Glue 3.0/4.0/5.0/5.1) -> DynamicFrame, job bookmark, Glue Studio visual editor. EvaluateDataQuality transform runs DQDL in-pipeline. ETL jobs update schema/partitions in the Data Catalog. Crawler runtime populates the catalog.

## APPENDIX B — KEY IAM ROLES & POLICIES

- AWSGlueServiceRole — crawler / ETL job execution role (read data stores, write Data Catalog)
- AWSGlueConsoleFullAccess / AWSGlueSchemaRegistryFullAccess / ReadonlyAccess — Glue + Schema Registry management
- LakeFormationDataAdmin — data lake administrator (manages LF permissions, registers locations, defines LF-Tags)
- AWSLakeFormationDataAdmin / AWSLakeFormationCrossAccountManager — Lake Formation admin + cross-account sharing
- Lake Formation grant model: principal = IAM role/user, SAML user/group, IAM Identity Center user/group, Quick user/group, AWS account / Organization / OU (cross-account)
- IAMAllowedPrincipals (backward-compat: IAM-only access) and ALLIAMPrincipals (LF + IAM) special groups
- AmazonDataZoneFullAccess / AmazonDataZoneDomainExecutionRolePolicy — DataZone domain admin + per-domain execution (GetLineageNode/ListLineageNodeHistory)
- DataZone PostLineageEvent — requires IAM ALLOW (authorized at API Gateway layer) to publish lineage
- AmazonS3TablesFullAccess / ReadOnlyAccess — S3 Tables (s3tables namespace) access
- AWS KMS key policy — Data Catalog / connection-password / S3 encryption (CMK)

## APPENDIX C — NAMING, ARCHITECTURE & GA/PREVIEW NOTES

- **No single unified catalog (unlike Google Dataplex):** AWS deliberately splits the 'knowledge catalog' across three planes — **AWS Glue Data Catalog** (central *technical* metastore: databases/tables/partitions/schemas/connections), **AWS Lake Formation** (fine-grained *governance* on top of the same Data Catalog resources), and **Amazon DataZone** (the *business* catalog: domains/projects/assets/glossaries/subscriptions). They share the underlying Glue Data Catalog as the substrate.
- **DataZone -> SageMaker rebrand (in progress):** Amazon DataZone is being folded into **Amazon SageMaker Catalog** within **Amazon SageMaker Unified Studio**. Domains can be upgraded in place; assets, metadata forms, glossaries, subscriptions, projects, domain units carry over. Environments/environment profiles are replaced by SageMaker **project profiles**. API namespace is still largely `datazone:*`. Business metadata now also supported in **IAM-based** (not only IAM Identity Center) SageMaker domains (May 2026).
- **LF-Tags vs Google data attributes:** Lake Formation **LF-Tags** are key-value pairs attached to catalog/database/table/column and grant permissions via **LF-Tag expressions** (LF-TBAC / ABAC). They inherit down the hierarchy. This is broadly analogous to Dataplex **Data Attributes** + **Data Attribute Bindings**, but LF-Tags are AWS's *native* governance primitive (not a separate attribute store) and also drive cross-account sharing.
- **Cell-level security via data filters:** Lake Formation `CreateDataCellsFilter` combines a PartiQL **row filter expression** + column **include/exclude lists** to achieve **column-, row-, and cell-level** security. Filters apply to read (`SELECT`) only; integrated engines (Athena/EMR/Redshift Spectrum) enforce the filtering.
- **S3 Tables = managed Apache Iceberg:** A new **table bucket** type (separate `s3tables` service namespace) storing **namespaces -> tables** as managed Iceberg with **automatic maintenance** (compaction, snapshot management, unreferenced-file removal). Integrates with Glue Data Catalog + Lake Formation; supports **Iceberg V3**. Distinct from self-managed Iceberg/Hudi/Delta tables that Glue ETL handles natively via `--datalake-formats` (no connector).
- **Glue Data Quality entry-point asymmetry:** **Anomaly detection (DetectAnomalies/ML)**, **dynamic rules** (`last()`), **labels**, and **row-level error record identification** are supported in the **ETL** entry point but NOT in the **Data Catalog** entry point; conversely **rule recommendations** are Data-Catalog-only. DQDL is built on open-source **DeeQu**.
- **Preview / recent-GA flags:** Glue Catalog **semantic search + business context** = *Preview*. DataZone **OpenLineage data lineage** went GA in the next-gen SageMaker/DataZone. Glue Data Quality **anomaly detection + dynamic rules** GA Aug 2024; **complex composite rules + File* rules** Nov 2024; **S3 Tables / RMS / LF-managed Iceberg in Data Catalog DQ** Jul 2025; **DQDL labels + constants** Nov 2025.
- **Schema Registry is a separate metastore:** The AWS Glue **Schema Registry** (registries -> schemas -> schema versions, 8 compatibility modes) governs *streaming* data contracts (Kafka/MSK/Kinesis/Flink/Lambda) and is distinct from the table metastore — both live under the Glue service but serve different planes.

---
**Version:** 1.0.0 — extracted from live AWS documentation, 2026-06-18.
**Use for:** onboarding to the AWS data catalog/governance stack, architecture & fine-grained-access design, exam/cert prep, grounding an LLM/agent, mapping a data-governance program across Glue + Lake Formation + DataZone.
**Ask the graph:** "What must I understand before Cell-level security?" · "Trace Amazon S3 bucket -> Subscription fulfillment workflow." · "What depends on LF-Tag (key-value pair)?"
