Out-of-the-Box Connector for Apache Hive
Octopai now supports an out-of-the-box connector for Apache Hive, enabling full E2E column-level lineage, metadata discovery, and integration with the Knowledge Hub for enhanced visibility and governance across Hive environments.
Key Capabilities:
- Automated Metadata Extraction – Retrieves tables, views, partitions, and query execution details from Hive.
- E2E Column Lineage – Tracks column-level transformations across Hive queries, joins, aggregations, and dependencies to provide precise lineage mapping.
- Cross-System Lineage – Connects Hive to ETL tools (Informatica, Spark), cloud data warehouses, and BI platforms, ensuring complete data flow visibility.
- Discovery & Knowledge Hub Integration – Enables metadata search, lineage exploration, and impact analysis through Octopai’s Discovery and Knowledge Hub, allowing users to easily find, analyze, and document Hive metadata.
- Change Detection & Impact Analysis – Identifies schema changes, column dependencies, and downstream impacts, helping teams manage changes proactively.
This new Hive connector enhances data governance, compliance, and operational intelligence by providing automated E2E lineage and metadata visibility across hybrid and cloud environments.
-------------------------------------------------------------------------------------------------------------
Out-of-the-Box Connector for Apache Impala
Octopai now offers native support for Apache Impala, enabling automated metadata extraction, cross-system and E2E column-level lineage, and integration with Discovery and the Knowledge Hub. This enhancement provides a comprehensive view of Impala queries, transformations, and dependencies within hybrid and cloud environments.
Key Capabilities:
- Metadata Extraction & Query Parsing – Automatically retrieves table structures, views, partitions, and SQL queries, capturing transformations at the query level.
- E2E Column Lineage – Traces column-level data flow across joins, aggregations, filters, and transformations, ensuring precise lineage tracking from source to consumption.
- Cross-System Lineage – Maps Impala interactions with upstream ETL processes (Informatica, Spark) and downstream BI/reporting tools, supporting hybrid cloud architectures.
- Discovery & Knowledge Hub Integration – Enables fast search, metadata analysis, and impact assessment via Octopai’s Discovery and Knowledge Hub, improving accessibility and governance.
- Change Impact & Anomaly Detection – Identifies schema modifications, column dependencies, and lineage breaks, helping teams manage risk and ensure data integrity.
This new Impala connector strengthens lineage automation, metadata intelligence, and operational insights, empowering teams with deeper visibility into Impala-based data ecosystems.
-------------------------------------------------------------------------------------------------------------
Cross-System & Inner-System Lineage for Pre-Post Informatica Workflows
Octopai now supports cross-system and inner-system lineage for Pre-Post Informatica PowerCenter workflows. This enhancement enables users to track data flow across systems before and after Informatica processing, ensuring full visibility into ETL dependencies, transformations, and downstream impacts.
Key Capabilities:
- Cross-System Lineage – Connects source systems, Informatica PowerCenter ETL workflows, and target environments, mapping data flow across multiple platforms.
- Inner-System Lineage – Captures detailed transformations within PowerCenter mappings, workflows, and sessions, including expressions, lookups, joins, and aggregations.
- Pre-Post ETL Tracking – Shows how data moves before Informatica ingestion (staging, raw data sources) and after target loading (data marts, reports, analytics tools).
- Impact Analysis & Change Management – Identifies upstream and downstream dependencies, helping assess risks from schema changes and transformation modifications.
This feature enhances troubleshooting, compliance, and governance by providing a complete end-to-end lineage view across Informatica PowerCenter environments.
-------------------------------------------------------------------------------------------------------------
CSV Export for Knowledge Hub Insights Dashboard (New, Updated, Suspended, Dropped Assets)
Users can now export Knowledge Hub Insights Dashboard results to CSV, enabling detailed analysis and external reporting of New, Updated, Suspended, and Dropped metadata assets. This enhancement provides greater flexibility in monitoring metadata lifecycle changes over a selected time range.
Key Capabilities
- CSV Export for Insights Dashboard – Download metadata activity metrics for deeper analysis and audit tracking.
- Comprehensive Lifecycle Tracking – Capture asset state changes based on automatic harvesting, manual updates, and system refreshes.
- Enhanced Reporting & Compliance – Maintain records of metadata changes for governance and audit requirements.
Breakdown of Exported Asset Categories
1. New Assets
- Definition: Metadata assets that were newly created within the selected time range.
- Source: Automatically harvested, manually added, or imported via bulk operations.
- Use Case: Identify recently introduced datasets for validation and governance.
2. Updated Assets
- Definition: Includes both newly created and existing assets that had metadata changes during the selected period.
- Source: System-detected updates, manual edits, or metadata refreshes.
- Use Case: Track modifications across metadata properties, ensuring up-to-date documentation.
3. Suspended Assets
- Definition: Assets that were marked as inactive or under review during the selected period.
- Source: Manual user action only (not automated).
- Use Case: Monitor assets that require investigation or exclusion from active use.
4. Dropped Assets
- Definition: Assets that existed in a previous system scan but were no longer detected in the latest extraction.
- Source: Automatically identified as missing due to deletion, migration, or metadata refresh.
- Use Case: Detect assets that have been removed, ensuring traceability of decommissioned metadata.
How to Use CSV Export
- Navigate to the Knowledge Hub Insights Dashboard.
- Select a date range to filter the relevant asset activity.
- Click Export to CSV to download lifecycle metrics for further analysis.
- Open the CSV file in Excel, BI tools, or any data processing platform for custom reporting.
This feature enhances metadata governance by providing structured, exportable insights into metadata lifecycle changes across environments.
-------------------------------------------------------------------------------------------------------------
Optimized Knowledge Graph for Enhanced Performance
We have improved the infrastructure of the Knowledge Graph, enhancing its scalability, query efficiency, and overall performance. These enhancements provide faster metadata retrieval, improved lineage visualization, and reduced query response times across large datasets.
Key Enhancements:
- Infrastructure Upgrade – Migration to a more optimized graph database backend, improving metadata processing speed.
- Scalability Improvements – Supports larger metadata volumes with enhanced indexing and query caching, reducing latency.
These improvements ensure a more seamless experience, allowing users to navigate large metadata ecosystems with greater efficiency and reliability.
Comments
0 comments
Please sign in to leave a comment.