<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.9.5">Jekyll</generator><link href="https://www.ozkary.dev/feed.xml" rel="self" type="application/atom+xml" /><link href="https://www.ozkary.dev/" rel="alternate" type="text/html" /><updated>2026-02-25T18:16:12-05:00</updated><id>https://www.ozkary.dev/feed.xml</id><title type="html">Ozkary Technologies</title><subtitle>A site with technology topics</subtitle><author><name>Oscar D. Garcia - Ozkary</name></author><entry><title type="html">AI Driven App Architecture - Smart Development Life Cycle Governance</title><link href="https://www.ozkary.dev/ai-driven-app-architecture-smart-development-life-cycle-governance/" rel="alternate" type="text/html" title="AI Driven App Architecture - Smart Development Life Cycle Governance" /><published>2026-02-25T00:00:00-05:00</published><updated>2026-02-25T08:00:00-05:00</updated><id>https://www.ozkary.dev/ai-driven-app-architecture-smart-development-life-cycle-governance</id><content type="html" xml:base="https://www.ozkary.dev/ai-driven-app-architecture-smart-development-life-cycle-governance/"><![CDATA[<h1 id="overview">Overview</h1>

<p>As development teams scale, maintaining architectural consistency becomes the biggest bottleneck. Documents are ignored, and linters only catch syntax errors, not design patterns.</p>

<p>In this session, we will demonstrate how to transform AI from a passive coding assistant into an active Architectural Enforcer. By embedding your “unwritten rules” directly into the repository configuration, you create a developer experience where the AI enforces your patterns in real-time.</p>

<p>We will explore how this shifts the workflow: new developers are guided by the AI from day one, preventing architectural leakage before a pull request is ever opened.</p>

<p><img src="../../assets/2026/ozkary-ai-driven-architecture-smart-development-life-cycle-governance.png" alt="AI Driven App Architecture - Smart Development Life Cycle Governance" title="AI Driven App Architecture - Smart Development Life Cycle Governance" /></p>

<h2 id="-featured-open-source-projects">🚀 Featured Open Source Projects</h2>
<p>Explore these curated resources to level up your engineering skills. If you find them helpful, a ⭐️ is much appreciated!</p>

<h3 id="️-data-engineering">🏗️ <a href="https://github.com/ozkary/data-engineering-mta-turnstile">Data Engineering</a></h3>
<blockquote>
  <p><strong>Focus:</strong> Real-world ETL &amp; MTA Turnstile Data<br />
<img src="https://img.shields.io/badge/Maintained-Yes-green.svg" alt="Maintained" /> <img src="https://img.shields.io/github/license/ozkary/data-engineering-mta-turnstile" alt="License" /></p>
</blockquote>

<h3 id="-artificial-intelligence">🤖 <a href="https://github.com/ozkary/ai-engineering">Artificial Intelligence</a></h3>
<blockquote>
  <p><strong>Focus:</strong> LLM Patterns and Agentic Workflows<br />
<img src="https://img.shields.io/badge/Status-Active_Development-blue.svg" alt="Status" /> <img src="https://img.shields.io/badge/Focus-Generative_AI-orange" alt="Topic" /></p>
</blockquote>

<h3 id="-machine-learning">📉 <a href="https://github.com/ozkary/machine-learning-engineering">Machine Learning</a></h3>
<blockquote>
  <p><strong>Focus:</strong> MLOps and Productionizing Models<br />
<img src="https://img.shields.io/badge/Build-Passing-brightgreen.svg" alt="Build" /> <img src="https://img.shields.io/badge/Stage-Production_Ready-blue" alt="Stage" /></p>
</blockquote>

<hr />
<p>💡 <strong>Contribute:</strong> Found a bug or have a suggestion? Open an issue! and be part of the open source project.</p>

<h2 id="youtube-video">YouTube Video</h2>

<iframe width="560" height="315" src="https://www.youtube.com/embed/wvhb9B3DeMY?si=gRHAES40_s1HdMkX" title="AI Driven App Architecture - Smart Development Life Cycle Governance" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>

<blockquote>
  <p>👍 Subscribe to the channel to get notify on new events!</p>
</blockquote>

<h3 id="video-agenda">Video Agenda</h3>

<p><strong>The Problem: Architectural Drift</strong></p>

<p>Why strict rules (Controller-View, Pascal/camelCase) degrade over time and how AI can fix it.</p>

<p><strong>The Intelligence Engine</strong></p>

<p>Breakdown of the core components: Global Rules, Contextual Guardrails, Agent Tools, and Directory Structure.</p>

<p><strong>Configuration: Global Governance</strong></p>

<p>Setting up global “system prompts” for the repository to enforce tech stack and naming conventions.</p>

<p><strong>Configuration: Contextual Guardrails</strong></p>

<p>Creating “firewalls” for specific folders (e.g., preventing logic in views, preventing API calls in Controllers).</p>

<p><strong>Configuration: The Tooling</strong></p>

<p>Building custom Slash Commands (/new-module) to automate “Vertical Slice” scaffolding.</p>

<p><strong>Configuration: The Auditor Agent</strong></p>

<p>Implementing a specialized “Gatekeeper” persona that scans imports to ensure strict layer separation.</p>

<p><strong>Agent Mapping</strong></p>

<p>A conceptual framework comparing repository configuration to autonomous agent architecture.</p>

<p><strong>💡 Why Attend?</strong></p>

<ul>
  <li>Stop writing boilerplate: Learn to automate complex folder structures with one command.</li>
  <li>Reduce PR Reviews: Shift governance “left” by having the AI catch architectural errors instantly.</li>
  <li>Interactive Demo: See the .github configuration in action on a real codebase.</li>
  <li>Takeaway Code: Leave with the copy-paste markdown templates to implement this in your own repo tomorrow.</li>
</ul>

<p><strong>Target Audience</strong></p>

<ul>
  <li>Tech Leads &amp; Architects who need to enforce standards across scaling teams.</li>
  <li>Developers who are tired of correcting the same patterns in code reviews.</li>
  <li>DevOps Engineers interested in “Governance as Code.”</li>
  <li>Leadership teams that are trying to raise standards and productivity in their organizations.</li>
</ul>

<h2 id="presentation">Presentation</h2>

<h3 id="setting-the-stage">SETTING THE STAGE</h3>

<p><strong>The Context</strong></p>
<ul>
  <li>We enforce a strict pattern using the ViCSA architecture</li>
  <li>PascalCase for UI Components.</li>
  <li>camelCase for Logic &amp; Services.</li>
  <li>Separation of Concerns (SoC)  is non-negotiable.</li>
</ul>

<p><strong>The Problem</strong></p>
<ul>
  <li>Architectural Drift: Patterns degrade over time.</li>
  <li>Passive Docs: Wiki pages are ignored.</li>
  <li>Linter Limits: Linters catch syntax, not architecture.</li>
  <li>Solution: Active Governance via AI.</li>
</ul>

<h3 id="the-intelligence-engine">THE INTELLIGENCE ENGINE</h3>

<p><strong>Core AI Policies</strong></p>

<ul>
  <li>Centralized Config: Rules live in the repo, not the user’s IDE.</li>
  <li>Global Rules: Applied to every interaction (System Prompt).</li>
  <li>Contextual Rules: Triggered only when specific files are opened.</li>
  <li>Agent Tools: Custom commands to scaffold new components, controllers or services.</li>
</ul>

<p><img src="../../assets/2026/ozkary-ai-driven-architecture-project-structure.png" alt="AI Driven App Architecture - Smart Development Life Cycle Governance - Project Structure" /></p>

<h3 id="configuration-global-governance">CONFIGURATION: GLOBAL GOVERNANCE</h3>

<p><strong>Global Instructions</strong></p>

<p><strong>File:</strong> <code class="language-plaintext highlighter-rouge">.github/copilot-instructions.md</code></p>

<p>This acts as the System Prompt for the entire repository. It is silently added to every interaction.</p>

<ul>
  <li>Tech Stack: TS, Tailwind, Hooks.</li>
  <li>Naming: Pascal vs camelCase.</li>
  <li>Flow: <code class="language-plaintext highlighter-rouge">View → Controller → Service -&gt; API</code>.</li>
</ul>

<p><img src="../../assets/2026/ozkary-ai-driven-architecture-global-governance.png" alt="AI Driven App Architecture - Smart Development Life Cycle Governance - Global Governance" /></p>

<h3 id="dev-experience-the-silent-enforcer">DEV EXPERIENCE: THE SILENT ENFORCER</h3>

<p><strong>Without Config</strong></p>

<p>A developer asks:</p>

<p><code class="language-plaintext highlighter-rouge">How do I create a new service?</code></p>

<ul>
  <li>AI suggests a generic Class-based service.</li>
  <li>Suggests creating a utils.js file.</li>
  <li>Ignores project folder structure.</li>
</ul>

<p><strong>With Config</strong></p>

<p>A developer asks: 
<code class="language-plaintext highlighter-rouge">How do I create a new service?"</code></p>

<ul>
  <li>AI reads the Governance.</li>
  <li>Response: <code class="language-plaintext highlighter-rouge">Create src/services/userAuth/index.ts using a functional export, as per project standards.</code></li>
</ul>

<h3 id="configuration-contextual-guardrails">CONFIGURATION: CONTEXTUAL GUARDRAILS</h3>

<p><strong>View Layer Rules</strong></p>

<p><strong>File:</strong> <code class="language-plaintext highlighter-rouge">.github/instructions/controller-layer.md</code></p>

<p><strong>Trigger:</strong> Opening any <code class="language-plaintext highlighter-rouge">**/*.tsx file</code>.</p>

<ul>
  <li>“You are a View.”</li>
  <li>“No Logic allowed.”</li>
  <li>“No direct API calls.”</li>
</ul>

<p><strong>Controller Layer Rules</strong></p>

<p><strong>File:</strong> <code class="language-plaintext highlighter-rouge">.github/instructions/view-layer.md</code></p>

<p><strong>Trigger:</strong> Opening any <code class="language-plaintext highlighter-rouge">**/controller.ts</code> file.</p>

<ul>
  <li>“You are a Controller.”</li>
  <li>“Use Services, NOT Fetch.”</li>
  <li>“Manage State here.”</li>
</ul>

<h3 id="dev-experience-real-time-intervention">DEV EXPERIENCE: REAL-TIME INTERVENTION</h3>

<p><strong>The Scenario</strong></p>

<ul>
  <li>A developer tries to write fetch() inside a UI Component (index.tsx).</li>
  <li>They ask Copilot: “Write a fetch call here for me.”</li>
</ul>

<p><strong>The Intervention</strong></p>

<p><strong>Ghost Text:</strong> Copilot refuses to autocomplete the network call.</p>

<p><strong>Chat Reply:</strong></p>

<p><code class="language-plaintext highlighter-rouge">I cannot. This is a View file. Please move this logic to the sibling Controller (index.ts) and import it.</code></p>

<h3 id="configuration-the-tooling">CONFIGURATION: THE TOOLING</h3>

<p><strong>Prompt Library</strong></p>

<p><strong>File:</strong> <code class="language-plaintext highlighter-rouge">.github/prompts/new-module.md</code></p>

<p>These act as Agent Tools or “Slash Commands”.</p>

<ul>
  <li>Goal: Automate the “Vertical Slice”.</li>
  <li>Benefit: Complex scaffolding logic is stored in the repo, not in the developer’s head.</li>
  <li>Usage: <code class="language-plaintext highlighter-rouge">/new-module</code></li>
</ul>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Prompt Library (The Scaffolder)
File: `.github/prompts/new-component.md`
Goal: Automate the creation of a standalone UI Component with optional Service/API layers.

# Create New Component
I need to generate a new component following our **Folder-as-Namespace** pattern.
**Command:** `/new-component: `

Please generate the code blocks for the layers requested in the arguments (service, api). 
*Note: Logic folders must be camelCase. UI folders must be PascalCase.*

---

### Component Layer (Required)
**Folder:** `src/components//`
- **File:** `controller.ts` (Controller): Logic and State only.
- **File:** `index.tsx` (View): Pure UI. Imports Controller.
---


### Service Layer (Optional)
*Condition: Generate only if 'service' is present in .*

**File:** `src/services//index.ts`
- **Role:** Business logic and data transformation.
- **Code:** Import the API (if requested). Export a service object or functional exports.

---

### API Layer (Optional)
*Condition: Generate only if 'api' is present in .*

**File:** `src/apis//index.ts`
- **Role:** Define specific endpoints.
- **Code:** Import `coreClient` from `src/apis/index.ts`. Export async functions with typed responses.

---

### Style Guidelines
- **Typing:** Use TypeScript interfaces for all Props and Data models.
- **Separation:** Logic stays in `controller.ts`, JSX stays in `index.tsx`.
- **Naming:** Components use PascalCase; Services/APIs use camelCase.
</code></pre></div></div>

<h3 id="dev-experience-the-scaffolding">DEV EXPERIENCE: THE SCAFFOLDING</h3>

<p><strong>The Command</strong></p>

<p>Starting a new feature called “Sales Dashboard”.</p>

<p><strong>Action:</strong></p>

<p><code class="language-plaintext highlighter-rouge">/new-module featureName:Sales Dashboard</code></p>

<p><strong>The Execution</strong></p>

<ul>
  <li>Analyzes the request.</li>
  <li>Applies <code class="language-plaintext highlighter-rouge">PascalCase</code> to Containers/Components folders.</li>
  <li>Applies <code class="language-plaintext highlighter-rouge">camelCase</code> to api/service folders.</li>
  <li>Generates the <code class="language-plaintext highlighter-rouge">Controller-View</code> pair instantly.</li>
</ul>

<h3 id="the-result-generated-architecture">THE RESULT: GENERATED ARCHITECTURE</h3>

<p><strong>The Results</strong></p>

<ul>
  <li>Layers generated instantly.</li>
  <li>Correct naming conventions applied.</li>
  <li>Zero manual boilerplate.</li>
</ul>

<p><img src="../../assets/2026/ozkary-ai-driven-architecture-project-structure.png" alt="AI Driven App Architecture - Smart Development Life Cycle Governance - Project Structure" /></p>

<h3 id="configuration-the-auditor-agent">CONFIGURATION: THE AUDITOR AGENT</h3>

<p><strong>Specialized Persona</strong></p>

<p><strong>File:</strong> <code class="language-plaintext highlighter-rouge">.github/agents/arch-auditor.md</code></p>

<p>This creates a named Agent that acts as a Gatekeeper. It doesn’t write features; it verifies them.</p>

<ul>
  <li>Role: Architecture Enforcer.</li>
  <li>Task: Scans imports to ensure strict layer separation.</li>
  <li>Rule: “Views never talk to APIs.”</li>
</ul>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Custom AI Agent (The Reviewer)
Agent ID: `@vicsa-auditor`

Context: A bot that ensures the chain of command is respected using the ViCSA architecture (View Controller Service API)

## Primary Objective
name: Architecture Auditor
description: Verifies strict separation of Controller, Service, and View layers.
tools: [code-search]

---
## Role
You ensure the integrity of the data flow: View -&gt; Controller -&gt; Service -&gt; API.

## Audit Logic
When asked to "Audit this feature":

1. **Check the View (.tsx):** - FAIL if it imports `src/services`.
   - FAIL if it imports `src/apis`.
   - PASS only if it imports `./index`.

2. **Check the Controller (.ts):**
   - FAIL if it uses `fetch` or `axios`.
   - PASS only if it delegates to `src/services`.

3. **Check the Service:**
   - FAIL if it defines its own URL logic.
   - PASS only if it imports `src/apis/index.ts`.

</code></pre></div></div>

<h3 id="dev-experience-the-code-review">DEV EXPERIENCE: THE CODE REVIEW</h3>

<p><strong>The Interaction</strong></p>

<p>Before raising a pull request, the developer invokes the auditor.</p>

<p><strong>Prompt:</strong></p>

<p><code class="language-plaintext highlighter-rouge">@vicsa-auditor check this component for violations.</code></p>

<p><strong>Response:</strong></p>

<p><code class="language-plaintext highlighter-rouge">✅ PASS: SalesDashboard/index.tsx imports only from its sibling controller. No direct API calls found.</code></p>

<p><img src="../../assets/2026/ozkary-ai-driven-architecture-review-process.png" alt="AI Driven App Architecture - Smart Development Life Cycle Governance - Review Process" /></p>

<h3 id="the-autonomy-advantage">THE AUTONOMY ADVANTAGE</h3>

<p>AI enforces the ViCSA architecture through continuous observation and autonomous execution.</p>

<ul>
  <li><strong>Perception</strong>: Continuously observes the active workspace, file paths (e.g., src/components/), and context to understand the developer’s structural intent.</li>
  <li><strong>Reasoning</strong>: Evaluates the perceived context against the repository’s .github Guardrails, determining if a View is bypassing a Controller or violating Separation of Concerns, SoC.</li>
  <li><strong>Action</strong>: Executes autonomous scaffolding, enforces strict ViCSA governance, provides recommended fixes feedback.</li>
</ul>

<h3 id="summary--agent-mapping">SUMMARY &amp; AGENT MAPPING</h3>

<p>Embedding governance directly into the repository transforms the development lifecycle. It replaces passive wiki pages with active, real-time enforcement, ensuring that every AI suggestion aligns with architectural standards. This eliminates “drift”, accelerates onboarding, and turns Copilot into a domain-expert partner.</p>

<table>
  <thead>
    <tr>
      <th>Agent Component</th>
      <th>GitHub Implementation</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>System Prompt</td>
      <td>Global Instructions (copilot-instructions.md)</td>
    </tr>
    <tr>
      <td>Context / RAG</td>
      <td>Modular Instructions (instructions/*.md)</td>
    </tr>
    <tr>
      <td>Tools / Functions</td>
      <td>Prompt Library (prompts/*.md)</td>
    </tr>
    <tr>
      <td>Human Prompt</td>
      <td>Chat Window</td>
    </tr>
    <tr>
      <td>Persona</td>
      <td>Agent Personas (i.e. agents/arch-auditor.md)</td>
    </tr>
  </tbody>
</table>

<blockquote>
  <p>RAG: Retrieval augmented generation</p>
</blockquote>

<h3 id="-lets-connect--build-together">🌟 Let’s Connect &amp; Build Together</h3>
<p>Thanks for reading! 😊 If you enjoyed these resources, let’s stay in touch! I share deep-dives into AI/ML patterns and host community events here:</p>

<ul>
  <li><strong><a href="https://gdg.community.dev/gdg-broward-county-fl/">GDG Broward</a></strong>: Join our local dev community for meetups and workshops.</li>
  <li><strong><a href="https://globalai.community/chapters/jacksonville/">Global AI Events</a></strong>: Join Global AI Events.</li>
  <li><strong><a href="https://www.linkedin.com/in/oscardgarcia">LinkedIn</a></strong>: Let’s connect professionally! I share insights on engineering.</li>
  <li><strong><a href="https://github.com/ozkary">GitHub</a></strong>: Follow my open-source journey and star the repos you find useful.</li>
  <li><strong><a href="https://www.youtube.com/@ozkary">YouTube</a></strong>: Watch step-by-step tutorials on the projects listed above.</li>
  <li><strong><a href="https://bsky.app/profile/ozkary.bsky.social">BlueSky</a></strong> / <strong><a href="https://x.com/ozkary">X / Twitter</a></strong>: Daily tech updates and quick engineering tips.</li>
</ul>

<p>👉 <em>Originally published at <a href="https://www.ozkary.com">ozkary.com</a></em></p>]]></content><author><name>Oscar D. Garcia - Ozkary</name></author><category term="code" /><category term="cloud" /><category term="github" /><category term="vscode" /><category term="ai" /><category term="ai-agent" /><summary type="html"><![CDATA[As development teams scale, maintaining architectural consistency becomes the biggest bottleneck. Documents are ignored, and linters only catch syntax errors, not design patterns. We will explore how this shifts the workflow: new developers are guided by the AI from day one, preventing architectural leakage before a pull request is ever opened.]]></summary></entry><entry><title type="html">The Cognitive Data Lakehouse: AI-Driven Unification and Semantic Modeling in a Zero-ETL Environment</title><link href="https://www.ozkary.dev/the-cognitive-data-lakehouse-ai-driven-unification-and-semantic-modeling-in-a-zero-etl-environment/" rel="alternate" type="text/html" title="The Cognitive Data Lakehouse: AI-Driven Unification and Semantic Modeling in a Zero-ETL Environment" /><published>2026-01-21T00:00:00-05:00</published><updated>2026-01-26T08:00:00-05:00</updated><id>https://www.ozkary.dev/the-cognitive-data-lakehouse-ai-driven-unification-and-semantic-modeling-in-a-zero-etl-environment</id><content type="html" xml:base="https://www.ozkary.dev/the-cognitive-data-lakehouse-ai-driven-unification-and-semantic-modeling-in-a-zero-etl-environment/"><![CDATA[<h1 id="overview">Overview</h1>

<p>In the modern data landscape, the wall between “where data lives” and “how we get insights” is crumbling. This session focuses on the Cognitive Data Lakehouse. A paradigm shift that allows developers to treat a fragmented data lake as a unified, high-performance warehouse.</p>

<p>We will explore how to move beyond brittle ETL pipelines using Zero-ETL architecture in the cloud. The core of our discussion will center on using integrated AI capabilities and semantic modeling to solve the “Metadata Mess” inherent in global manufacturing feeds without moving a single byte of data. From raw telemetry in object storage to semantic intelligence via large language models, we’ll show you the real-world application of AI in modern data engineering.</p>

<p><img src="../../assets/2026/ozkary-the-cognitive-data-lakehouse-ai-driven-unification-and-semantic-modeling-in-a-zero-etl-environment.png" alt="The Cognitive Data Lakehouse: AI-Driven Unification and Semantic Modeling in a Zero-ETL Environment" title="The Cognitive Data Lakehouse: AI-Driven Unification and Semantic Modeling in a Zero-ETL Environment" /></p>

<h2 id="-featured-open-source-projects">🚀 Featured Open Source Projects</h2>
<p>Explore these curated resources to level up your engineering skills. If you find them helpful, a ⭐️ is much appreciated!</p>

<h3 id="️-data-engineering">🏗️ <a href="https://github.com/ozkary/data-engineering-mta-turnstile">Data Engineering</a></h3>
<blockquote>
  <p><strong>Focus:</strong> Real-world ETL &amp; MTA Turnstile Data<br />
<img src="https://img.shields.io/badge/Maintained-Yes-green.svg" alt="Maintained" /> <img src="https://img.shields.io/github/license/ozkary/data-engineering-mta-turnstile" alt="License" /></p>
</blockquote>

<h3 id="-artificial-intelligence">🤖 <a href="https://github.com/ozkary/ai-engineering">Artificial Intelligence</a></h3>
<blockquote>
  <p><strong>Focus:</strong> LLM Patterns and Agentic Workflows<br />
<img src="https://img.shields.io/badge/Status-Active_Development-blue.svg" alt="Status" /> <img src="https://img.shields.io/badge/Focus-Generative_AI-orange" alt="Topic" /></p>
</blockquote>

<h3 id="-machine-learning">📉 <a href="https://github.com/ozkary/machine-learning-engineering">Machine Learning</a></h3>
<blockquote>
  <p><strong>Focus:</strong> MLOps and Productionizing Models<br />
<img src="https://img.shields.io/badge/Build-Passing-brightgreen.svg" alt="Build" /> <img src="https://img.shields.io/badge/Stage-Production_Ready-blue" alt="Stage" /></p>
</blockquote>

<hr />
<p>💡 <strong>Contribute:</strong> Found a bug or have a suggestion? Open an issue! and be part of the open source project.</p>

<h2 id="youtube-video">YouTube Video</h2>

<iframe width="560" height="315" src="https://www.youtube.com/embed/nfJl-4BxqyY?si=mHyV5N547HqZ0rJx" title="The Cognitive Data Lakehouse: AI-Driven Unification and Semantic Modeling in a Zero-ETL Environment" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>

<h3 id="video-agenda">Video Agenda</h3>

<p><strong>Phase 1: Foundations &amp; The Zero-ETL Strategy</strong></p>

<p>We kick off with the infrastructure layer. We’ll discuss the design of cross-region telemetry tables and how modern cloud engines allow us to query raw files in object storage with the performance of a native table. We’ll establish why “0x data movement” is the goal for modern scalability.</p>

<p><strong>Phase 2: Confronting the Metadata Mess</strong></p>

<p>Schema drift and inconsistent naming across global regions are the enemies of unified analytics. We will look at why traditional manual mapping fails and how we can use AI inference to bridge these gaps and standardize naming conventions automatically.</p>

<p><strong>Phase 3: AI-Driven Unification &amp; Semantic Modeling</strong></p>

<p>The “Cognitive” part of the Lakehouse. We’ll dive into the technical implementation of registering AI models directly within your data warehouse environment. You’ll see how to create an abstraction layer that uses AI to normalize data on the fly, creating a robust semantic model.</p>

<p><strong>Phase 4: Scaling to a Global Feed</strong></p>

<p>Finally, we’ll demonstrate the DevOps workflow for integrating a new international factory feed into a global telemetry view. We’ll show how to maintain a “Single Source of Intelligence” that BI tools and analysts can consume without needing to know the complexities of the underlying lake.</p>

<p><strong>💡 Why Attend?</strong></p>

<ul>
  <li>Master Modern Architecture: Learn the “Abstraction Layer” design pattern that is replacing traditional, slow ETL/ELT processes.</li>
  <li>Hands-on AI for Data Ops: See exactly how to use AI and semantic modeling within SQL-based workflows to automate data cleaning and schema mapping.</li>
  <li>Scale Without Pain: Discover how to manage global data sources (multi-region, multi-format) through a single governing layer.</li>
  <li>Developer Networking: Connect with other data architects, engineering leaders, and professionals solving similar scale and complexity challenges.</li>
</ul>

<p><strong>Target Audience:</strong> Data Engineers, Analytics Architects, Cloud Developers, and anyone interested in the intersection of Big Data and Generative AI.</p>

<h2 id="presentation">Presentation</h2>
<h3 id="phase-1-the-zero-etl-strategy">Phase 1: The Zero-ETL Strategy</h3>
<h4 id="infrastructure-data-stays-local">INFRASTRUCTURE: DATA STAYS LOCAL</h4>

<p><strong>Architecting for Scale</strong></p>

<ul>
  <li>Storage Decoupling: Raw files remain in the Data Lake, eliminating replication overhead.</li>
  <li>Virtual Access: Data Warehouse external tables allow immediate querying of CSV, Parquet, and JSON.</li>
  <li>Minimal Latency: No waiting for ingest pipelines; analysis starts upon file arrival.</li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-medallion-architecture-raw-zone.png" alt="The Cognitive Data Lakehouse: AI-Driven Unification and Semantic Modeling in a Zero-ETL Environment -  Medallion Architecture Design Diagram " title="The Cognitive Data Lakehouse: AI-Driven Unification and Semantic Modeling in a Zero-ETL Environment - Data Stays Local " /></p>

<h4 id="unmatched-storage-efficiency">UNMATCHED STORAGE EFFICIENCY</h4>

<p><strong>Zero Data Replication</strong></p>

<ul>
  <li>Traditional ETL requires moving data across multiple tiers. Our architecture ensures a single source of truth with zero data movement between GCS and BigQuery compute.</li>
  <li>This is similar to the Bronze Zone in a Medallion Architecture.</li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-medallion-architecture-bronze-zone.png" alt="The Cognitive Data Lakehouse: AI-Driven Unification and Semantic Modeling in a Zero-ETL Environment -  Medallion Architecture Design Diagram " title="The Cognitive Data Lakehouse: AI-Driven Unification and Semantic Modeling in a Zero-ETL Environment - Storage Efficiency" /></p>

<h3 id="phase-2-the-metadata-mess">Phase 2: The Metadata Mess</h3>
<h4 id="challenges-of-unification">CHALLENGES OF UNIFICATION</h4>

<p><strong>Schema Friction</strong></p>
<ul>
  <li>Feeds arrive with inconsistent headers (e.g., ‘Device Number’ vs ‘deviceNo’). Manual aliasing is fragile and slow.</li>
</ul>

<p><strong>Entity Drift</strong></p>
<ul>
  <li>Names and IDs vary across systems, preventing standard joins from matching records effectively.</li>
</ul>

<p><strong>Type Mismatches</strong></p>
<ul>
  <li>Varying data types for the same concept (Integer vs String) crash standard SQL aggregation views.</li>
</ul>

<h3 id="phase-3-the-ai-solution">Phase 3: The AI Solution</h3>
<h4 id="bigquery-studio-the-ai-interface">BIGQUERY STUDIO: THE AI INTERFACE</h4>

<p><strong>Remote AI Registration</strong></p>
<ul>
  <li>Register Gemini Pro directly inside BigQuery to enable cognitive functions within your SQL workspace.</li>
</ul>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="n">MODEL</span> <span class="nv">`gemini_remote`</span>
<span class="n">REMOTE</span> <span class="k">WITH</span> <span class="k">CONNECTION</span> <span class="nv">`bq_connection`</span>
<span class="k">OPTIONS</span><span class="p">(</span><span class="n">endpoint</span> <span class="o">=</span> <span class="s1">'gemini-1.5-pro'</span><span class="p">);</span>

</code></pre></div></div>

<p><strong>Automated Inference</strong></p>
<ul>
  <li>AI “reads” information schemas to infer mapping logic, moving you from Code Author to Logic Approver.</li>
</ul>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">ml_generate_text_result</span>
<span class="k">FROM</span> <span class="n">ML</span><span class="p">.</span><span class="n">GENERATE_TEXT</span><span class="p">(</span>
  <span class="n">MODEL</span> <span class="nv">`gemini_remote`</span><span class="p">,</span>
  <span class="p">(</span><span class="k">SELECT</span> <span class="nv">"Compare Source A and B schemas. Write a SQL view to unify them."</span> <span class="k">AS</span> <span class="n">prompt</span><span class="p">)</span>
<span class="p">);</span>

</code></pre></div></div>
<h4 id="ai-assisted-schema-discovery">AI-ASSISTED SCHEMA DISCOVERY</h4>
<p><strong>Prompting for Base Tables</strong></p>
<ul>
  <li>Using AI to generate the DDL for external tables by pointing to compressed feeds in the lake (USA &amp; MEX factories).</li>
</ul>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">ml_generate_text_result</span>
<span class="k">FROM</span> <span class="n">ML</span><span class="p">.</span><span class="n">GENERATE_TEXT</span><span class="p">(</span>
  <span class="n">MODEL</span> <span class="nv">`gemini_remote`</span><span class="p">,</span>
  <span class="p">(</span><span class="k">SELECT</span> <span class="nv">"Create External Tables as smart_factory.us_telemetry with path 'gs://factory-dl/us/dev-540/telemetry-*.csv.gz' '. Include option CSV, GZIP compression and skip 1 row. Infer and add the schema using lower case"</span> <span class="k">AS</span> <span class="n">prompt</span><span class="p">));</span>

<span class="k">SELECT</span> <span class="n">ml_generate_text_result</span>
<span class="k">FROM</span> <span class="n">ML</span><span class="p">.</span><span class="n">GENERATE_TEXT</span><span class="p">(</span>
  <span class="n">MODEL</span> <span class="nv">`gemini_remote`</span><span class="p">,</span>
  <span class="p">(</span><span class="k">SELECT</span> <span class="nv">"Create External Tables as smart_factory.mx_telemetry with path 'gs://factory-dl/mx/dev-940/telemetry-*.csv.gz' '. Include option CSV, GZIP compression and skip 1 row. Use schema device_number STRING, bay_id INT64, factory STRING, created STRING"</span> <span class="k">AS</span> <span class="n">prompt</span><span class="p">));</span>

</code></pre></div></div>

<p><strong>Generated BigLake DDL</strong></p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- USA Factory Feed</span>
<span class="k">CREATE</span> <span class="k">OR</span> <span class="k">REPLACE</span> <span class="k">EXTERNAL</span> <span class="k">TABLE</span> <span class="nv">`smart_factory.us_telemetry`</span> <span class="p">(</span>
  <span class="n">device_number</span> <span class="n">STRING</span><span class="p">,</span>
  <span class="n">bay_id</span> <span class="n">INT64</span><span class="p">,</span>
  <span class="n">factory</span> <span class="n">STRING</span><span class="p">,</span>
  <span class="n">created</span> <span class="n">STRING</span>
<span class="p">)</span>
<span class="k">OPTIONS</span> <span class="p">(</span>
  <span class="n">format</span> <span class="o">=</span> <span class="s1">'CSV'</span><span class="p">,</span>
  <span class="n">uris</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'gs://factory-dl/us/dev-540/telemetry*.csv.gz'</span><span class="p">],</span>
  <span class="n">skip_leading_rows</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span>
  <span class="n">compression</span> <span class="o">=</span> <span class="s1">'GZIP'</span>
<span class="p">);</span>

<span class="c1">-- MEX Factory Feed</span>
<span class="k">CREATE</span> <span class="k">OR</span> <span class="k">REPLACE</span> <span class="k">EXTERNAL</span> <span class="k">TABLE</span> <span class="nv">`smart_factory.mx_telemetry`</span> <span class="p">(</span>
  <span class="n">device_number</span> <span class="n">STRING</span><span class="p">,</span>
  <span class="n">bay_id</span> <span class="n">INT64</span><span class="p">,</span>
  <span class="n">factory</span> <span class="n">STRING</span><span class="p">,</span>
  <span class="n">created</span> <span class="n">STRING</span>
<span class="p">)</span>
<span class="k">OPTIONS</span> <span class="p">(</span>
  <span class="n">format</span> <span class="o">=</span> <span class="s1">'CSV'</span><span class="p">,</span>
  <span class="n">uris</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'gs://factory-dl/mx/dev-940/telemetry*.csv.gz'</span><span class="p">],</span>
  <span class="n">skip_leading_rows</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span>
  <span class="n">compression</span> <span class="o">=</span> <span class="s1">'GZIP'</span>
<span class="p">);</span>


</code></pre></div></div>

<h4 id="ai-abstraction-the-view-layer">AI-ABSTRACTION: THE VIEW LAYER</h4>
<p><strong>Generating the Interface</strong></p>
<ul>
  <li>AI creates a clean abstraction view for each external table, decoupling raw storage from the analytics model.</li>
</ul>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- AI Instruction</span>
<span class="nv">"Create a view named 
smart_factory.vw_us_telemetry 
selecting all columns from the
usa_telemetry table. Safe cast the created column as datetime."</span>
</code></pre></div></div>

<p><strong>Abstraction Layer DDL</strong></p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Semantic Abstraction Layer</span>
<span class="k">CREATE</span> <span class="k">OR</span> <span class="k">REPLACE</span> <span class="k">VIEW</span> <span class="nv">`smart_factory.vw_us_telemetry`</span> <span class="k">AS</span>
<span class="k">SELECT</span> 
  <span class="n">device_number</span><span class="p">,</span>
  <span class="n">bay_id</span><span class="p">,</span>
  <span class="n">factory</span><span class="p">,</span>
  <span class="n">SAFE_CAST</span><span class="p">(</span><span class="n">created</span> <span class="k">as</span> <span class="nb">DATETIME</span><span class="p">)</span> <span class="k">AS</span> <span class="n">created</span>
<span class="k">FROM</span> <span class="nv">`smart_factory.us_telemetry`</span><span class="p">;</span>

</code></pre></div></div>

<h4 id="cognitive-unification">COGNITIVE UNIFICATION</h4>

<p><strong>The Multi-Region Model</strong></p>

<ul>
  <li>The unified view now consumes from the abstraction layer, ensuring that changes to raw storage don’t break the views down stream.</li>
</ul>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- AI Instruction</span>
<span class="nv">"Create a view with name
smart_factory.vw_telemetry that creates a union of all the fields from the views vw_[region]_telemetry. The regions include us and mx. List out all the field names. Never use * for field names"</span>

</code></pre></div></div>

<p><strong>Unified Global View</strong></p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Semantic Abstraction Layer</span>
<span class="k">CREATE</span> <span class="k">OR</span> <span class="k">REPLACE</span> <span class="k">VIEW</span> <span class="nv">`smart_factory.vw_telemetry`</span> <span class="k">AS</span>
<span class="k">SELECT</span> 
  <span class="n">device_number</span><span class="p">,</span>
  <span class="n">bay_id</span><span class="p">,</span>
  <span class="n">factory</span><span class="p">,</span>
  <span class="n">created</span>
<span class="k">FROM</span> <span class="nv">`smart_factory.vw_us_telemetry`</span>
<span class="k">UNION</span> <span class="k">ALL</span>
<span class="k">SELECT</span> 
  <span class="n">device_number</span><span class="p">,</span>
  <span class="n">bay_id</span><span class="p">,</span>
  <span class="n">factory</span><span class="p">,</span>
  <span class="n">created</span>
<span class="k">FROM</span> <span class="nv">`smart_factory.vw_mx_telemetry`</span>

</code></pre></div></div>

<h4 id="scaling-to-china-factory">SCALING TO CHINA FACTORY</h4>

<p><strong>Evolving the Model</strong></p>

<ul>
  <li>Adding the new China feed by generating the External Table definition via AI.</li>
</ul>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">OR</span> <span class="k">REPLACE</span> <span class="k">EXTERNAL</span> <span class="k">TABLE</span> <span class="nv">`smart_factory.cn_telemetry`</span> <span class="p">(</span>
  <span class="n">device_number</span> <span class="n">STRING</span><span class="p">,</span>
  <span class="n">bay_id</span> <span class="n">INT64</span><span class="p">,</span>
  <span class="n">factory</span> <span class="n">STRING</span><span class="p">,</span>
  <span class="n">created</span> <span class="n">STRING</span>
<span class="p">)</span>
<span class="k">OPTIONS</span> <span class="p">(</span>
  <span class="n">format</span> <span class="o">=</span> <span class="s1">'CSV'</span><span class="p">,</span>
  <span class="n">uris</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'gs://factory-dl/cn/dev-900/telemetry*.csv.gz'</span><span class="p">],</span>
  <span class="n">skip_leading_rows</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span>
  <span class="n">compression</span> <span class="o">=</span> <span class="s1">'GZIP'</span>

</code></pre></div></div>

<p><strong>Human-in-the-Loop DevOps</strong></p>
<ul>
  <li>Use AI to update the unified view with the new data feed.  Review and apply the changes by the DevOps team, as changes to a production view require approval.</li>
</ul>

<h4 id="manufacturing-spc--root-cause-analysis">Manufacturing SPC &amp; Root Cause Analysis</h4>

<ul>
  <li>This query calculates a rolling mean and standard deviation over the last 10 minutes of telemetry to detect anomalies, “Out of Control” conditions.</li>
</ul>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">WITH</span> <span class="n">TelemetryStats</span> <span class="k">AS</span> <span class="p">(</span>
  <span class="k">SELECT</span>
    <span class="n">machine_id</span><span class="p">,</span>
    <span class="nb">timestamp</span><span class="p">,</span>
    <span class="n">sensor_reading</span><span class="p">,</span>
    <span class="c1">-- Calculate rolling stats for the "Control Chart"</span>
    <span class="k">AVG</span><span class="p">(</span><span class="n">sensor_reading</span><span class="p">)</span> <span class="n">OVER</span><span class="p">(</span><span class="k">PARTITION</span> <span class="k">BY</span> <span class="n">machine_id</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="nb">timestamp</span> <span class="k">ROWS</span> <span class="k">BETWEEN</span> <span class="mi">20</span> <span class="k">PRECEDING</span> <span class="k">AND</span> <span class="k">CURRENT</span> <span class="k">ROW</span><span class="p">)</span> <span class="k">as</span> <span class="n">rolling_avg</span><span class="p">,</span>
    <span class="n">STDDEV</span><span class="p">(</span><span class="n">sensor_reading</span><span class="p">)</span> <span class="n">OVER</span><span class="p">(</span><span class="k">PARTITION</span> <span class="k">BY</span> <span class="n">machine_id</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="nb">timestamp</span> <span class="k">ROWS</span> <span class="k">BETWEEN</span> <span class="mi">20</span> <span class="k">PRECEDING</span> <span class="k">AND</span> <span class="k">CURRENT</span> <span class="k">ROW</span><span class="p">)</span> <span class="k">as</span> <span class="n">rolling_stddev</span>
  <span class="k">FROM</span> <span class="nv">`production_data.mx_telemetry_stream`</span>
  <span class="k">WHERE</span> <span class="nb">timestamp</span> <span class="o">&gt;</span> <span class="n">TIMESTAMP_SUB</span><span class="p">(</span><span class="k">CURRENT_TIMESTAMP</span><span class="p">(),</span> <span class="n">INTERVAL</span> <span class="mi">1</span> <span class="n">HOUR</span><span class="p">)</span>
<span class="p">),</span>
<span class="n">Anomalies</span> <span class="k">AS</span> <span class="p">(</span>
  <span class="k">SELECT</span> <span class="o">*</span><span class="p">,</span>
    <span class="c1">-- Define "Out of Control" (Reading &gt; 3 Sigma from mean)</span>
    <span class="k">ABS</span><span class="p">(</span><span class="n">sensor_reading</span> <span class="o">-</span> <span class="n">rolling_avg</span><span class="p">)</span> <span class="o">&gt;</span> <span class="p">(</span><span class="mi">3</span> <span class="o">*</span> <span class="n">rolling_stddev</span><span class="p">)</span> <span class="k">AS</span> <span class="n">is_out_of_control</span>
  <span class="k">FROM</span> <span class="n">TelemetryStats</span>
<span class="p">)</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">Anomalies</span> <span class="k">WHERE</span> <span class="n">is_out_of_control</span> <span class="o">=</span> <span class="k">TRUE</span><span class="p">;</span>

</code></pre></div></div>

<h4 id="control-chart-visualization">Control Chart Visualization</h4>

<p><img src="../../assets/2026/ozkary-the-cognitive-data-lakehouse-ai-driven-unification-and-semantic-modeling-in-a-zero-etl-environment-control-charts.png" alt="The Cognitive Data Lakehouse: AI-Driven Unification and Semantic Modeling in a Zero-ETL Environment - Control Charts" /></p>

<h4 id="advantage-comparison-matrix">ADVANTAGE COMPARISON MATRIX</h4>

<table>
  <thead>
    <tr>
      <th style="text-align: left">Metric</th>
      <th style="text-align: left">Manual Data Engineering</th>
      <th style="text-align: left">AI-Augmented Zero-ETL</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left">Unification Speed</td>
      <td style="text-align: left">Days/Weeks per Source</td>
      <td style="text-align: left">Minutes via Generative AI</td>
    </tr>
    <tr>
      <td style="text-align: left">Schema Drift</td>
      <td style="text-align: left">Manual Script Rewrites</td>
      <td style="text-align: left">Adaptive AI View Discovery</td>
    </tr>
    <tr>
      <td style="text-align: left">Infrastructure Cost</td>
      <td style="text-align: left">High (Data Redundancy)</td>
      <td style="text-align: left">Minimal (In-place on GCS)</td>
    </tr>
  </tbody>
</table>

<p><strong>Strategic Intelligence ROI:</strong></p>

<blockquote>
  <p>ROI(ai) = Insights Velocity / (Movement Cost + Labor Hours)</p>
</blockquote>

<h4 id="final-thoughts-strategic-summary">FINAL THOUGHTS: STRATEGIC SUMMARY</h4>

<p><strong>Legacy Challenges</strong></p>

<ul>
  <li>Brittle ETL: Manual pipelines break with every schema change.</li>
  <li>Cost Inefficiency: Redundant storage for processed data.</li>
  <li>Semantic Silos: Hard-coded aliases for disparate naming conventions.</li>
  <li>Slow Time-to-Insight: Weeks spent on manual schema alignment.</li>
</ul>

<p><strong>AI-Assisted Solutions</strong></p>

<ul>
  <li>Zero-ETL Arch: Cost-effective storage with Data Lake virtual access.</li>
  <li>Automated Inference: Vertex AI handles the “heavy lifting” of mapping.</li>
  <li>Adaptive DevOps: Scalable model evolution (USA → MEX → China).</li>
  <li>Unified Intelligence: One virtual source of truth for global analytics.</li>
</ul>

<blockquote>
  <p>Moving from Data Reporting to Active Semantic Intelligence.</p>
</blockquote>

<h3 id="weve-covered-a-lot-today-but-this-is-just-the-beginning">We’ve covered a lot today, but this is just the beginning!</h3>

<p>If you’re interested in learning more about building cloud data pipelines, I encourage you to check out my book, ‘Data Engineering Process Fundamentals,’ part of the Data Engineering Process Fundamentals series. It provides in-depth explanations, code samples, and practical exercises to help in your learning.</p>

<p><a href="https://a.co/d/gyoRfbs"><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-book-cover.jpg" alt="Data Engineering Process Fundamentals - Book by Oscar Garcia" /></a>  <a href="https://a.co/d/gyoRfbs"><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-book-back-cover.jpg" alt="Data Engineering Process Fundamentals - Book by Oscar Garcia" /></a></p>

<hr />

<h3 id="-upcoming-sessions">📅 Upcoming Sessions</h3>
<p>Our upcoming series expands beyond data engineering to bridge the gap between <strong>AI</strong>, <strong>Machine Learning</strong>, and <strong>modern cloud architecture</strong>. Using our <a href="https://github.com/ozkary/data-engineering-mta-turnstile">Data</a>, <a href="https://github.com/ozkary/ai-engineering">AI</a>, and <a href="https://github.com/ozkary/machine-learning-engineering">ML</a> GitHub blueprints, we provide the code-first patterns needed to build everything from Zero-ETL pipelines to scalable LLM-powered systems. Join us to explore how these integrated disciplines work together to turn raw data into production-ready intelligence.</p>

<hr />
<h3 id="-lets-connect--build-together">🌟 Let’s Connect &amp; Build Together</h3>
<p>If you enjoyed these resources, let’s stay in touch! I share deep-dives into AI/ML patterns and host community events here:</p>

<ul>
  <li><strong><a href="https://gdg.community.dev/gdg-broward-county-fl/">GDG Broward</a></strong>: Join our local dev community for meetups and workshops.</li>
  <li><strong><a href="https://www.linkedin.com/in/oscardgarcia">LinkedIn</a></strong>: Let’s connect professionally! I share insights on engineering.</li>
  <li><strong><a href="https://github.com/ozkary">GitHub</a></strong>: Follow my open-source journey and star the repos you find useful.</li>
  <li><strong><a href="https://www.youtube.com/@ozkary">YouTube</a></strong>: Watch step-by-step tutorials on the projects listed above.</li>
  <li><strong><a href="https://bsky.app/profile/ozkary.bsky.social">BlueSky</a></strong> / <strong><a href="https://x.com/ozkary">X / Twitter</a></strong>: Daily tech updates and quick engineering tips.</li>
</ul>

<p>👉 <em>Originally published at <a href="https://www.ozkary.com">ozkary.com</a></em></p>]]></content><author><name>Oscar D. Garcia - Ozkary</name></author><category term="python" /><category term="cloud" /><category term="github" /><category term="vscode" /><category term="docker" /><category term="data lake" /><category term="data warehouse" /><category term="Kafka" /><category term="Spark" /><summary type="html"><![CDATA[In the modern data landscape, the wall between where data lives and how we get insights is crumbling. This session focuses on the Cognitive Data Lakehouse. A paradigm shift that allows developers to treat a fragmented data lake as a unified, high-performance warehouse.]]></summary></entry><entry><title type="html">From Raw Data to Governance: Refining Data with the Medallion Architecture Dec 2025</title><link href="https://www.ozkary.dev/from-raw-data-to-governance-refining-data-medallion-architecture-dec-2025/" rel="alternate" type="text/html" title="From Raw Data to Governance: Refining Data with the Medallion Architecture Dec 2025" /><published>2025-12-10T00:00:00-05:00</published><updated>2025-12-10T08:00:00-05:00</updated><id>https://www.ozkary.dev/from-raw-data-to-governance-refining-data-medallion-architecture-dec-2025</id><content type="html" xml:base="https://www.ozkary.dev/from-raw-data-to-governance-refining-data-medallion-architecture-dec-2025/"><![CDATA[<h1 id="overview">Overview</h1>

<p>Build upon your existing data engineering expertise and discover how Medallion Architecture can transform your data strategy. This session provides a hands-on approach to implementing Medallion principles, empowering you to create a robust, scalable, and governed data platform.</p>

<p>We’ll explore how to align data engineering processes with Medallion Architecture, identifying opportunities for optimization and improvement. By understanding the core principles and practical implementation steps, you’ll learn how to optimize data pipelines, enhance data quality, and unlock valuable insights through a structured, layered approach to drive business success.</p>

<p><img src="../../assets/2025/ozkary-from-raw-data-to-governance-medallion-architecture.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture" title="From Raw Data to Governance: Refining Data with the Medallion Architecture" /></p>

<ul>
  <li>Follow this GitHub repo during the presentation: (Give it a star)</li>
</ul>

<blockquote>
  <p>👉 https://github.com/ozkary/data-engineering-mta-turnstile</p>
</blockquote>

<ul>
  <li>Read more information on my blog at:</li>
</ul>

<blockquote>
  <p>👉 https://www.ozkary.com/2023/03/data-engineering-process-fundamentals.html</p>
</blockquote>

<h2 id="youtube-video">YouTube Video</h2>

<iframe width="560" height="315" src="https://www.youtube.com/embed/E87qPNObF7g?si=f6ii8FOVH8sPI0Dv" title="From Raw Data to Governance: Refining Data with the Medallion Architecture" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>

<h3 id="video-agenda">Video Agenda</h3>

<ul>
  <li>Introduction to Medallion Architecture
    <ul>
      <li>Defining Medallion Architecture</li>
      <li>Core Principles</li>
      <li>Benefits of Medallion Architecture</li>
    </ul>
  </li>
  <li>The Raw Zone
    <ul>
      <li>Understanding the purpose of the Raw Zone</li>
      <li>Best practices for data ingestion and storage</li>
    </ul>
  </li>
  <li>The Bronze Zone
    <ul>
      <li>Data transformation and cleansing</li>
      <li>Creating a foundation for analysis</li>
    </ul>
  </li>
  <li>The Silver Zone
    <ul>
      <li>Data optimization and summarization</li>
      <li>Preparing data for consumption</li>
    </ul>
  </li>
  <li>The Gold Zone
    <ul>
      <li>Curated data for insights and action</li>
      <li>Enabling self-service analytics</li>
    </ul>
  </li>
  <li>Empowering Insights
    <ul>
      <li>Data-driven decision-making</li>
      <li>Accelerated Insights</li>
    </ul>
  </li>
  <li>Data Governance
    <ul>
      <li>Importance of data governance in Medallion Architecture</li>
      <li>Implementing data ownership and stewardship</li>
      <li>Ensuring data quality and security</li>
    </ul>
  </li>
</ul>

<p><strong>Why Attend:</strong></p>

<p>Gain a deep understanding of Medallion Architecture and its application in modern data engineering. Learn how to optimize data pipelines, improve data quality, and unlock valuable insights. Discover practical steps to implement Medallion principles in your organization and drive data-driven decision-making.</p>

<h2 id="presentation">Presentation</h2>

<h3 id="introducing-medallion-architecture">Introducing Medallion Architecture</h3>

<p>Medallion architecture is a data management approach that organizes data into distinct layers based on its quality and processing level.</p>

<ul>
  <li><strong>Improved Data Quality:</strong> By separating data into different zones, you can focus on data quality at each stage.</li>
  <li><strong>Enhanced Data Governance:</strong> Clear data ownership and lineage improve data trustworthiness.</li>
  <li><strong>Accelerated Insights:</strong> Optimized data in the Silver and Gold zones enables faster query performance.</li>
  <li><strong>Scalability:</strong> The layered approach can accommodate growing data volumes and complexity.</li>
  <li><strong>Cost Efficiency:</strong> Optimized data storage and processing can reduce costs.</li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-medallion-architecture-high-level-design.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Design Diagram " title="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Design Diagram" /></p>

<h3 id="the-raw-zone-foundation-of-your-data-lake">The Raw Zone: Foundation of Your Data Lake</h3>

<p>The Raw Zone is the initial landing place for raw, unprocessed data. It serves as a historical archive of your data sources.</p>

<ul>
  <li><strong>Key Characteristics:</strong>
    <ul>
      <li>Unstructured or semi-structured format (e.g., CSV, JSON, Parquet)</li>
      <li>Data is ingested as-is, without any cleaning or transformation</li>
      <li>High volume and velocity</li>
      <li>Data retention policies are crucial</li>
    </ul>
  </li>
  <li><strong>Benefits:</strong>
    <ul>
      <li>Preserves original data for potential future analysis</li>
      <li>Enables data reprocessing</li>
      <li>Supports data lineage and auditability</li>
    </ul>
  </li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-medallion-architecture-raw-zone.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Raw Zone Diagram " title="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Raw Zone Diagram" /></p>

<h3 id="the-bronze-zone-transforming-raw-data">The Bronze Zone: Transforming Raw Data</h3>

<p>The Bronze Zone is where raw data undergoes initial cleaning, structuring, and transformation. It serves as a staging area for data before moving to the Silver Zone.</p>

<ul>
  <li><strong>Key Characteristics:</strong>
    <ul>
      <li>Data is cleansed and standardized</li>
      <li>Basic transformations are applied (e.g., data type conversions, null handling)</li>
      <li>Data is structured into tables or views</li>
      <li>Data quality checks are implemented</li>
      <li>Data retention policies may be shorter than the Raw Zone</li>
    </ul>
  </li>
  <li><strong>Benefits:</strong>
    <ul>
      <li>Improves data quality and consistency</li>
      <li>Provides a foundation for further analysis</li>
      <li>Enables data exploration and discovery</li>
    </ul>
  </li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-medallion-architecture-bronze-zone.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Bronze Zone Diagram " title="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Bronze Zone Diagram" /></p>

<h3 id="the-silver-zone-a-foundation-for-insights">The Silver Zone: A Foundation for Insights</h3>

<p>The Silver Zone houses data that has been further refined, aggregated, and optimized for specific use cases. It serves as a bridge between the raw data and the final curated datasets.</p>

<ul>
  <li><strong>Key Characteristics:</strong>
    <ul>
      <li>Data is cleansed, standardized, and enriched</li>
      <li>Data is structured for analytical purposes (e.g., normalized, de-normalized)</li>
      <li>Data is optimized for query performance (e.g., partitioning, indexing)</li>
      <li>Data is aggregated and summarized for specific use cases</li>
    </ul>
  </li>
  <li><strong>Benefits:</strong>
    <ul>
      <li>Improved query performance</li>
      <li>Supports self-service analytics</li>
      <li>Enables advanced analytics and machine learning</li>
      <li>Reduces query costs</li>
    </ul>
  </li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-medallion-architecture-silver-zone.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Silver Zone Diagram " title="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Silver Zone Diagram" /></p>

<h3 id="the-gold-zone-your-datas-final-destination">The Gold Zone: Your Data’s Final Destination</h3>

<ul>
  <li><strong>Definition:</strong> The Gold Zone contains the final, curated datasets ready for consumption by business users and applications. It is the pinnacle of data transformation and optimization.</li>
  <li><strong>Key Characteristics:</strong>
    <ul>
      <li>Data is highly refined, aggregated, and optimized for specific use cases</li>
      <li>Data is often materialized for performance</li>
      <li>Data is subject to rigorous quality checks and validation</li>
      <li>Data is secured and governed</li>
    </ul>
  </li>
  <li><strong>Benefits:</strong>
    <ul>
      <li>Enables rapid insights and decision-making</li>
      <li>Supports self-service analytics and reporting</li>
      <li>Provides a foundation for advanced analytics and machine learning</li>
      <li>Reduces query latency</li>
    </ul>
  </li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-medallion-architecture-gold-zone.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Gold Zone Diagram " title="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Gold Zone Diagram" /></p>

<h3 id="the-gold-zone-empowering-insights-and-actions">The Gold Zone: Empowering Insights and Actions</h3>

<p>The Gold Zone is the final destination for data, providing a foundation for insights, analysis, and action. It houses curated, optimized datasets ready for consumption.</p>

<ul>
  <li><strong>Key Characteristics:</strong>
    <ul>
      <li>Data is accessible and easily consumable</li>
      <li>Supports various analytical tools and platforms (BI, ML, data science)</li>
      <li>Enables self-service analytics</li>
      <li>Drives business decisions and actions</li>
    </ul>
  </li>
  <li><strong>Examples of Consumption Tools:</strong>
    <ul>
      <li>Business Intelligence (BI) tools (Looker, Tableau, Power BI)</li>
      <li>Data science platforms (Python, R, SQL)</li>
      <li>Machine learning platforms (TensorFlow, PyTorch)</li>
      <li>Advanced analytics tools</li>
    </ul>
  </li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-medallion-architecture-analysis.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Analysis Diagram " title="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Analysis Diagram" /></p>

<h3 id="data-governance-the-cornerstone-of-data-management">Data Governance: The Cornerstone of Data Management</h3>

<p><strong>Data governance</strong> is the framework that defines how data is managed within an organization, while <strong>data management</strong> is the operational execution of those policies. Data Governance is essential for ensuring data quality, consistency, and security.</p>

<p><strong>Key components of data governance include:</strong></p>

<ul>
  <li><strong>Data Lineage:</strong> Tracking data’s journey from source to consumption.</li>
  <li><strong>Data Ownership:</strong> Defining who is responsible for data accuracy and usage.</li>
  <li><strong>Data Stewardship:</strong> Managing data on a day-to-day basis, ensuring quality and compliance.</li>
  <li><strong>Data Security:</strong> Protecting data from unauthorized access, use, disclosure, disruption, modification, or destruction.</li>
  <li><strong>Compliance:</strong> Adhering to industry regulations (e.g., GDPR, CCPA, HIPAA) and internal policies.</li>
</ul>

<p>By establishing clear roles, responsibilities, and data lineage, organizations can build trust in their data, improve decision-making, and mitigate risks.</p>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-medallion-architecture-governance.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Data Governance " title="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Data Governance" /></p>

<h3 id="data-transformation-and-incremental-strategy">Data Transformation and Incremental Strategy</h3>

<p>The data transformation phase is a critical stage in a data warehouse project. This phase involves several key steps, including data extraction, cleaning, loading, data type casting, use of naming conventions, and implementing incremental loads to continuously insert the new information since the last update via batch processes.</p>

<p><img src="../../assets/2024/ozkary-data-engineering-process-lineage.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Data transformation lineage" title="From Raw Data to Governance: Refining Data with the Medallion Architecture -   Data transformation lineage" /></p>

<p>Data Lineage: Tracks the flow of data from its origin to its destination, including all the intermediate processes and transformations that it undergoes.</p>

<h3 id="data-governance--metadata">Data Governance : Metadata</h3>

<p>Assigns the owner, steward and responsibilities of the data.</p>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-medallion-architecture-metadata.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Governance Metadata " title="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Governance metadata" /></p>

<h3 id="summary-leverage-medallion-architecture-for-success">Summary: Leverage Medallion Architecture for Success</h3>

<ul>
  <li><strong>Key Benefits:</strong>
    <ul>
      <li>Improved data quality</li>
      <li>Enhanced governance</li>
      <li>Accelerated insights</li>
      <li>Scalability</li>
      <li>Cost Efficiency.</li>
    </ul>
  </li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-medallion-architecture-diagram.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Diagram " title="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Diagram" /></p>

<h3 id="weve-covered-a-lot-today-but-this-is-just-the-beginning">We’ve covered a lot today, but this is just the beginning!</h3>

<p>If you’re interested in learning more about building cloud data pipelines, I encourage you to check out my book, ‘Data Engineering Process Fundamentals,’ part of the Data Engineering Process Fundamentals series. It provides in-depth explanations, code samples, and practical exercises to help in your learning.</p>

<p><a href="https://a.co/d/gyoRfbs"><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-book-cover.jpg" alt="Data Engineering Process Fundamentals - Book by Oscar Garcia" /></a>  <a href="https://a.co/d/gyoRfbs"><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-book-back-cover.jpg" alt="Data Engineering Process Fundamentals - Book by Oscar Garcia" /></a></p>

<p><strong>Upcoming Talks:</strong></p>

<p>Join us for subsequent sessions in our Data Engineering Process Fundamentals series, where we will delve deeper into specific facets of data engineering, exploring topics such as data modeling, pipelines, and best practices in data governance.</p>

<p>This presentation is based on the book, <a href="https://www.amazon.com/Data-Engineering-Process-Fundamentals-Hands/dp/B0CV7TPSNB">Data Engineering Process Fundamentals</a>, which provides a more comprehensive guide to the topics we’ll cover. You can find all the sample code and datasets used in this presentation on our popular GitHub repository <a href="https://github.com/ozkary/data-engineering-mta-turnstile">Introduction to Data Engineering Process Fundamentals</a>.</p>

<p>Thanks for reading! 😊 If you enjoyed this post and would like to stay updated with our latest content, don’t forget to follow us. Join our community and be the first to know about new articles, exclusive insights, and more!</p>

<ul>
  <li><a href="https://gdg.community.dev/gdg-broward-county-fl/">Google Developer Group</a></li>
  <li><a href="https://github.com/ozkary">GitHub</a></li>
  <li><a href="https://x.com/ozkary">Twitter</a></li>
  <li><a href="https://www.youtube.com/@ozkary">YouTube</a></li>
  <li><a href="https://bsky.app/profile/ozkary.bsky.social">BlueSky</a></li>
</ul>

<p>👍 Originally published by <a href="https://www.ozkary.com">ozkary.com</a></p>]]></content><author><name>Oscar D. Garcia - Ozkary</name></author><category term="python" /><category term="cloud" /><category term="github" /><category term="vscode" /><category term="docker" /><category term="data lake" /><category term="Kafka" /><category term="Spark" /><summary type="html"><![CDATA[Gain understanding of Medallion Architecture and its application in modern data engineering. Learn how to optimize data pipelines, improve data quality, and unlock valuable insights. Discover practical steps to implement Medallion principles in your organization and drive data-driven decision-making.]]></summary></entry><entry><title type="html">From Raw Data to Governance: Refining Data with the Medallion Architecture</title><link href="https://www.ozkary.dev/from-raw-data-to-governance-refining-data-medallion-architecture/" rel="alternate" type="text/html" title="From Raw Data to Governance: Refining Data with the Medallion Architecture" /><published>2025-11-19T00:00:00-05:00</published><updated>2025-11-20T08:00:00-05:00</updated><id>https://www.ozkary.dev/from-raw-data-to-governance-refining-data-medallion-architecture</id><content type="html" xml:base="https://www.ozkary.dev/from-raw-data-to-governance-refining-data-medallion-architecture/"><![CDATA[<h1 id="overview">Overview</h1>

<p>Build upon your existing data engineering expertise and discover how Medallion Architecture can transform your data strategy. This session provides a hands-on approach to implementing Medallion principles, empowering you to create a robust, scalable, and governed data platform.</p>

<p>We’ll explore how to align data engineering processes with Medallion Architecture, identifying opportunities for optimization and improvement. By understanding the core principles and practical implementation steps, you’ll learn how to optimize data pipelines, enhance data quality, and unlock valuable insights through a structured, layered approach to drive business success.</p>

<p><img src="../../assets/2025/ozkary-from-raw-data-to-governance-medallion-architecture.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture" title="From Raw Data to Governance: Refining Data with the Medallion Architecture" /></p>

<ul>
  <li>Follow this GitHub repo during the presentation: (Give it a star)</li>
</ul>

<blockquote>
  <p>👉 https://github.com/ozkary/data-engineering-mta-turnstile</p>
</blockquote>

<ul>
  <li>Read more information on my blog at:</li>
</ul>

<blockquote>
  <p>👉 https://www.ozkary.com/2023/03/data-engineering-process-fundamentals.html</p>
</blockquote>

<h2 id="youtube-video">YouTube Video</h2>

<iframe width="560" height="315" src="https://www.youtube.com/embed/EPYbPLKUxDE?si=sAe8k3beWEcxBEYT" title="From Raw Data to Governance: Refining Data with the Medallion Architecture" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>

<h3 id="video-agenda">Video Agenda</h3>

<ul>
  <li>Introduction to Medallion Architecture
    <ul>
      <li>Defining Medallion Architecture</li>
      <li>Core Principles</li>
      <li>Benefits of Medallion Architecture</li>
    </ul>
  </li>
  <li>The Raw Zone
    <ul>
      <li>Understanding the purpose of the Raw Zone</li>
      <li>Best practices for data ingestion and storage</li>
    </ul>
  </li>
  <li>The Bronze Zone
    <ul>
      <li>Data transformation and cleansing</li>
      <li>Creating a foundation for analysis</li>
    </ul>
  </li>
  <li>The Silver Zone
    <ul>
      <li>Data optimization and summarization</li>
      <li>Preparing data for consumption</li>
    </ul>
  </li>
  <li>The Gold Zone
    <ul>
      <li>Curated data for insights and action</li>
      <li>Enabling self-service analytics</li>
    </ul>
  </li>
  <li>Empowering Insights
    <ul>
      <li>Data-driven decision-making</li>
      <li>Accelerated Insights</li>
    </ul>
  </li>
  <li>Data Governance
    <ul>
      <li>Importance of data governance in Medallion Architecture</li>
      <li>Implementing data ownership and stewardship</li>
      <li>Ensuring data quality and security</li>
    </ul>
  </li>
</ul>

<p><strong>Why Attend:</strong></p>

<p>Gain a deep understanding of Medallion Architecture and its application in modern data engineering. Learn how to optimize data pipelines, improve data quality, and unlock valuable insights. Discover practical steps to implement Medallion principles in your organization and drive data-driven decision-making.</p>

<h2 id="presentation">Presentation</h2>

<h3 id="introducing-medallion-architecture">Introducing Medallion Architecture</h3>

<p>Medallion architecture is a data management approach that organizes data into distinct layers based on its quality and processing level.</p>

<ul>
  <li><strong>Improved Data Quality:</strong> By separating data into different zones, you can focus on data quality at each stage.</li>
  <li><strong>Enhanced Data Governance:</strong> Clear data ownership and lineage improve data trustworthiness.</li>
  <li><strong>Accelerated Insights:</strong> Optimized data in the Silver and Gold zones enables faster query performance.</li>
  <li><strong>Scalability:</strong> The layered approach can accommodate growing data volumes and complexity.</li>
  <li><strong>Cost Efficiency:</strong> Optimized data storage and processing can reduce costs.</li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-medallion-architecture-high-level-design.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Design Diagram " title="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Design Diagram" /></p>

<h3 id="the-raw-zone-foundation-of-your-data-lake">The Raw Zone: Foundation of Your Data Lake</h3>

<p>The Raw Zone is the initial landing place for raw, unprocessed data. It serves as a historical archive of your data sources.</p>

<ul>
  <li><strong>Key Characteristics:</strong>
    <ul>
      <li>Unstructured or semi-structured format (e.g., CSV, JSON, Parquet)</li>
      <li>Data is ingested as-is, without any cleaning or transformation</li>
      <li>High volume and velocity</li>
      <li>Data retention policies are crucial</li>
    </ul>
  </li>
  <li><strong>Benefits:</strong>
    <ul>
      <li>Preserves original data for potential future analysis</li>
      <li>Enables data reprocessing</li>
      <li>Supports data lineage and auditability</li>
    </ul>
  </li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-medallion-architecture-raw-zone.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Raw Zone Diagram " title="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Raw Zone Diagram" /></p>

<h3 id="the-bronze-zone-transforming-raw-data">The Bronze Zone: Transforming Raw Data</h3>

<p>The Bronze Zone is where raw data undergoes initial cleaning, structuring, and transformation. It serves as a staging area for data before moving to the Silver Zone.</p>

<ul>
  <li><strong>Key Characteristics:</strong>
    <ul>
      <li>Data is cleansed and standardized</li>
      <li>Basic transformations are applied (e.g., data type conversions, null handling)</li>
      <li>Data is structured into tables or views</li>
      <li>Data quality checks are implemented</li>
      <li>Data retention policies may be shorter than the Raw Zone</li>
    </ul>
  </li>
  <li><strong>Benefits:</strong>
    <ul>
      <li>Improves data quality and consistency</li>
      <li>Provides a foundation for further analysis</li>
      <li>Enables data exploration and discovery</li>
    </ul>
  </li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-medallion-architecture-bronze-zone.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Bronze Zone Diagram " title="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Bronze Zone Diagram" /></p>

<h3 id="the-silver-zone-a-foundation-for-insights">The Silver Zone: A Foundation for Insights</h3>

<p>The Silver Zone houses data that has been further refined, aggregated, and optimized for specific use cases. It serves as a bridge between the raw data and the final curated datasets.</p>

<ul>
  <li><strong>Key Characteristics:</strong>
    <ul>
      <li>Data is cleansed, standardized, and enriched</li>
      <li>Data is structured for analytical purposes (e.g., normalized, de-normalized)</li>
      <li>Data is optimized for query performance (e.g., partitioning, indexing)</li>
      <li>Data is aggregated and summarized for specific use cases</li>
    </ul>
  </li>
  <li><strong>Benefits:</strong>
    <ul>
      <li>Improved query performance</li>
      <li>Supports self-service analytics</li>
      <li>Enables advanced analytics and machine learning</li>
      <li>Reduces query costs</li>
    </ul>
  </li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-medallion-architecture-silver-zone.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Silver Zone Diagram " title="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Silver Zone Diagram" /></p>

<h3 id="the-gold-zone-your-datas-final-destination">The Gold Zone: Your Data’s Final Destination</h3>

<ul>
  <li><strong>Definition:</strong> The Gold Zone contains the final, curated datasets ready for consumption by business users and applications. It is the pinnacle of data transformation and optimization.</li>
  <li><strong>Key Characteristics:</strong>
    <ul>
      <li>Data is highly refined, aggregated, and optimized for specific use cases</li>
      <li>Data is often materialized for performance</li>
      <li>Data is subject to rigorous quality checks and validation</li>
      <li>Data is secured and governed</li>
    </ul>
  </li>
  <li><strong>Benefits:</strong>
    <ul>
      <li>Enables rapid insights and decision-making</li>
      <li>Supports self-service analytics and reporting</li>
      <li>Provides a foundation for advanced analytics and machine learning</li>
      <li>Reduces query latency</li>
    </ul>
  </li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-medallion-architecture-gold-zone.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Gold Zone Diagram " title="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Gold Zone Diagram" /></p>

<h3 id="the-gold-zone-empowering-insights-and-actions">The Gold Zone: Empowering Insights and Actions</h3>

<p>The Gold Zone is the final destination for data, providing a foundation for insights, analysis, and action. It houses curated, optimized datasets ready for consumption.</p>

<ul>
  <li><strong>Key Characteristics:</strong>
    <ul>
      <li>Data is accessible and easily consumable</li>
      <li>Supports various analytical tools and platforms (BI, ML, data science)</li>
      <li>Enables self-service analytics</li>
      <li>Drives business decisions and actions</li>
    </ul>
  </li>
  <li><strong>Examples of Consumption Tools:</strong>
    <ul>
      <li>Business Intelligence (BI) tools (Looker, Tableau, Power BI)</li>
      <li>Data science platforms (Python, R, SQL)</li>
      <li>Machine learning platforms (TensorFlow, PyTorch)</li>
      <li>Advanced analytics tools</li>
    </ul>
  </li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-medallion-architecture-analysis.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Analysis Diagram " title="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Analysis Diagram" /></p>

<h3 id="data-governance-the-cornerstone-of-data-management">Data Governance: The Cornerstone of Data Management</h3>

<p><strong>Data governance</strong> is the framework that defines how data is managed within an organization, while <strong>data management</strong> is the operational execution of those policies. Data Governance is essential for ensuring data quality, consistency, and security.</p>

<p><strong>Key components of data governance include:</strong></p>

<ul>
  <li><strong>Data Lineage:</strong> Tracking data’s journey from source to consumption.</li>
  <li><strong>Data Ownership:</strong> Defining who is responsible for data accuracy and usage.</li>
  <li><strong>Data Stewardship:</strong> Managing data on a day-to-day basis, ensuring quality and compliance.</li>
  <li><strong>Data Security:</strong> Protecting data from unauthorized access, use, disclosure, disruption, modification, or destruction.</li>
  <li><strong>Compliance:</strong> Adhering to industry regulations (e.g., GDPR, CCPA, HIPAA) and internal policies.</li>
</ul>

<p>By establishing clear roles, responsibilities, and data lineage, organizations can build trust in their data, improve decision-making, and mitigate risks.</p>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-medallion-architecture-governance.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Data Governance " title="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Data Governance" /></p>

<h3 id="data-transformation-and-incremental-strategy">Data Transformation and Incremental Strategy</h3>

<p>The data transformation phase is a critical stage in a data warehouse project. This phase involves several key steps, including data extraction, cleaning, loading, data type casting, use of naming conventions, and implementing incremental loads to continuously insert the new information since the last update via batch processes.</p>

<p><img src="../../assets/2024/ozkary-data-engineering-process-lineage.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Data transformation lineage" title="From Raw Data to Governance: Refining Data with the Medallion Architecture -   Data transformation lineage" /></p>

<p>Data Lineage: Tracks the flow of data from its origin to its destination, including all the intermediate processes and transformations that it undergoes.</p>

<h3 id="data-governance--metadata">Data Governance : Metadata</h3>

<p>Assigns the owner, steward and responsibilities of the data.</p>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-medallion-architecture-metadata.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Governance Metadata " title="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Governance metadata" /></p>

<h3 id="summary-leverage-medallion-architecture-for-success">Summary: Leverage Medallion Architecture for Success</h3>

<ul>
  <li><strong>Key Benefits:</strong>
    <ul>
      <li>Improved data quality</li>
      <li>Enhanced governance</li>
      <li>Accelerated insights</li>
      <li>Scalability</li>
      <li>Cost Efficiency.</li>
    </ul>
  </li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-medallion-architecture-diagram.png" alt="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Diagram " title="From Raw Data to Governance: Refining Data with the Medallion Architecture -  Medallion Architecture Diagram" /></p>

<h3 id="weve-covered-a-lot-today-but-this-is-just-the-beginning">We’ve covered a lot today, but this is just the beginning!</h3>

<p>If you’re interested in learning more about building cloud data pipelines, I encourage you to check out my book, ‘Data Engineering Process Fundamentals,’ part of the Data Engineering Process Fundamentals series. It provides in-depth explanations, code samples, and practical exercises to help in your learning.</p>

<p><a href="https://a.co/d/gyoRfbs"><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-book-cover.jpg" alt="Data Engineering Process Fundamentals - Book by Oscar Garcia" /></a>  <a href="https://a.co/d/gyoRfbs"><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-book-back-cover.jpg" alt="Data Engineering Process Fundamentals - Book by Oscar Garcia" /></a></p>

<p><strong>Upcoming Talks:</strong></p>

<p>Join us for subsequent sessions in our Data Engineering Process Fundamentals series, where we will delve deeper into specific facets of data engineering, exploring topics such as data modeling, pipelines, and best practices in data governance.</p>

<p>This presentation is based on the book, <a href="https://www.amazon.com/Data-Engineering-Process-Fundamentals-Hands/dp/B0CV7TPSNB">Data Engineering Process Fundamentals</a>, which provides a more comprehensive guide to the topics we’ll cover. You can find all the sample code and datasets used in this presentation on our popular GitHub repository <a href="https://github.com/ozkary/data-engineering-mta-turnstile">Introduction to Data Engineering Process Fundamentals</a>.</p>

<p>Thanks for reading! 😊 If you enjoyed this post and would like to stay updated with our latest content, don’t forget to follow us. Join our community and be the first to know about new articles, exclusive insights, and more!</p>

<ul>
  <li><a href="https://gdg.community.dev/gdg-broward-county-fl/">Google Developer Group</a></li>
  <li><a href="https://github.com/ozkary">GitHub</a></li>
  <li><a href="https://x.com/ozkary">Twitter</a></li>
  <li><a href="https://www.youtube.com/@ozkary">YouTube</a></li>
  <li><a href="https://bsky.app/profile/ozkary.bsky.social">BlueSky</a></li>
</ul>

<p>👍 Originally published by <a href="https://www.ozkary.com">ozkary.com</a></p>]]></content><author><name>Oscar D. Garcia - Ozkary</name></author><category term="python" /><category term="cloud" /><category term="github" /><category term="vscode" /><category term="docker" /><category term="data lake" /><category term="Kafka" /><category term="Spark" /><summary type="html"><![CDATA[Gain understanding of Medallion Architecture and its application in modern data engineering. Learn how to optimize data pipelines, improve data quality, and unlock valuable insights. Discover practical steps to implement Medallion principles in your organization and drive data-driven decision-making.]]></summary></entry><entry><title type="html">From Raw Data to Analytics: The Modern Data Layer Architecture</title><link href="https://www.ozkary.dev/from-raw-data-to-analytics-the-modern-data-layer-architecture/" rel="alternate" type="text/html" title="From Raw Data to Analytics: The Modern Data Layer Architecture" /><published>2025-10-29T00:00:00-04:00</published><updated>2025-10-29T09:00:00-04:00</updated><id>https://www.ozkary.dev/from-raw-data-to-analytics-the-modern-data-layer-architecture</id><content type="html" xml:base="https://www.ozkary.dev/from-raw-data-to-analytics-the-modern-data-layer-architecture/"><![CDATA[<h1 id="overview">Overview</h1>

<p>This presentation is part of the Data Engineering Process Fundamentals series, focusing on the essential architectural components—the Data Lake and the Data Warehouse—and defining their respective roles in a modern analytics ecosystem.</p>

<p><img src="../../assets/2025/ozkary-data-engineering-data-lake-data-warehouse-raw-data-to-analytics.png" alt="From Raw Data to Analytics: The Modern Data Layer Architecture" title="From Raw Data to Analytics: The Modern Data Layer Architecture" /></p>

<ul>
  <li>Follow this GitHub repo during the presentation: (Star the project to follow and get updates)</li>
</ul>

<blockquote>
  <p>👉 <a href="https://github.com/ozkary/data-engineering-mta-turnstile">GitHub Repo</a></p>
</blockquote>

<ul>
  <li>Data engineering Series:</li>
</ul>

<blockquote>
  <p>👉 <a href="https://www.ozkary.com/2023/03/data-engineering-process-fundamentals.html">Blog Series</a></p>
</blockquote>

<h2 id="youtube-video">YouTube Video</h2>

<iframe width="560" height="315" src="https://www.youtube.com/embed/9DYtKz4qVbA?si=mk_s1myV6aB7bBgZ" title="From Raw Data to Analytics: The Modern Data Layer Architecture" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>

<h3 id="video-agenda">Video Agenda</h3>

<p>Agenda:</p>

<ol>
  <li>Introduction to Data Engineering:</li>
</ol>

<ul>
  <li>
    <p>Brief overview of the data engineering landscape and its critical role in modern data-driven organizations.</p>
  </li>
  <li>
    <p>Operational Data</p>
  </li>
</ul>

<ol>
  <li>Understanding Data Lakes:</li>
</ol>

<ul>
  <li>Explanation of what a data lake is and its purpose in storing vast amounts of raw and unstructured data.</li>
</ul>

<ol>
  <li>Exploring Data Warehouses:</li>
</ol>

<ul>
  <li>Definition of data warehouses and their role in storing structured, processed, and business-ready data.</li>
</ul>

<ol>
  <li>Comparing Data Lakes and Data Warehouses:</li>
</ol>

<ul>
  <li>
    <p>Comparative analysis of data lakes and data warehouses, highlighting their strengths and weaknesses.</p>
  </li>
  <li>
    <p>Discussing when to use each based on specific use cases and business needs.</p>
  </li>
</ul>

<ol>
  <li>Integration and Data Pipelines:</li>
</ol>

<ul>
  <li>
    <p>Insight into the seamless integration of data lakes and data warehouses within a data engineering pipeline.</p>
  </li>
  <li>
    <p>Code walkthrough showcasing data movement and transformation between these two crucial components.</p>
  </li>
</ul>

<ol>
  <li>Real-world Use Cases:</li>
</ol>

<ul>
  <li>
    <p>Presentation of real-world use cases where effective use of data lakes and data warehouses led to actionable insights and business success.</p>
  </li>
  <li>
    <p>Hands-on demonstration using Python, Jupyter Notebook and SQL to solidify the concepts discussed, providing attendees with practical insights and skills.</p>
  </li>
</ul>

<ol>
  <li>Q&amp;A and Hands-on Session:</li>
</ol>

<ul>
  <li>An interactive Q&amp;A session to address any queries.</li>
</ul>

<p>Conclusion:</p>

<p>This session aims to equip attendees with a strong foundation in data engineering, focusing on the pivotal role of data lakes and data warehouses. By the end of this presentation, participants will grasp how to effectively utilize these tools, enabling them to design efficient data solutions and drive informed business decisions.</p>

<p>This presentation will be accompanied by live code demonstrations and interactive discussions, ensuring attendees gain practical knowledge and valuable insights into the dynamic world of data engineering.</p>

<h3 id="supporting-materials-reminder">Supporting Materials Reminder</h3>

<p><strong>Subsequent Sessions:</strong> Join us for future sessions in our Data Engineering Process Fundamentals series, where we will build a data pipeline and delve deeper into topics like orchestration and governance.</p>

<p><strong>Resources:</strong> This presentation is based on the book, Data Engineering Process Fundamentals, and all supporting code and examples are available on our popular GitHub repository.</p>

<h2 id="presentation">Presentation</h2>

<h3 id="data-engineering-overview">Data Engineering Overview</h3>

<p>A Data Engineering Process involves executing steps to understand the problem, scope, design, and architecture for creating a solution. This enables ongoing big data analysis using analytical and visualization tools.</p>

<p><strong>Topics</strong></p>

<ul>
  <li>Data Lake and Data Warehouse</li>
  <li>Discovery and Data Analysis</li>
  <li>Design and Infrastructure Planning</li>
  <li>Data Lake - Pipeline and Orchestration</li>
  <li>Data Warehouse - Design and Implementation</li>
  <li>Analysis and Visualization</li>
</ul>

<p><strong>Follow this project: Give a star</strong></p>

<blockquote>
  <p>👉 <a href="//github.com/ozkary/data-engineering-mta-turnstile">Data Engineering Process Fundamentals</a></p>
</blockquote>

<h3 id="operational-data">Operational Data</h3>

<p>Operational data is often generated by applications, and it is stored in transactional relational databases like SQL Server, Oracle and NoSQL (JSON) databases like MongoDB, Firebase. This is the data that is created after an application saves a user transaction like contact information, a purchase or other activities that are available from the application.</p>

<p><strong>Features:</strong></p>

<ul>
  <li>Application support and transactions</li>
  <li>Relational data structure and SQL or document structure NoSQL</li>
  <li>Small queries for case analysis</li>
</ul>

<p><strong>Not Best For:</strong></p>

<ul>
  <li>Reporting system</li>
  <li>Large queries</li>
  <li>Centralized Big Data system</li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-operational-data.png" alt="Data Engineering Process Fundamentals - Operational Data" title="Data Engineering Process Fundamentals - Operational Data" /></p>

<h3 id="data-lake---analytical-data-staging">Data Lake - Analytical Data Staging</h3>

<p>A Data Lake is an optimized storage system for Big Data scenarios. The primary function is to store the data in its raw format without any transformation. Analytical data is the transaction data that has been extracted from a source system via a data pipeline as part of the staging data process.</p>

<p><strong>Features:</strong></p>

<ul>
  <li>Store the data in its raw format without any transformation</li>
  <li>This can include structure data like CSV files, unstructured data like JSON and XML documents, or column-base data like parquet files</li>
  <li>Low Cost for massive storage power</li>
  <li>Not Designed for querying or data analysis</li>
  <li>It is used as external tables by most systems</li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-staging-data.png" alt="Data Engineering Process Fundamentals - Analytical Data staging" /></p>

<h3 id="data-warehouse---analytical-data">Data Warehouse - Analytical Data</h3>

<p>A Data Warehouse is a centralized storage system that stores integrated data from multiple sources. The system is designed to host and serve Big Data scenarios with lower operational cost than transaction databases, but higher costs than a Data Lake. This system host the Analytical Data that has been processed and is ready for analytical purposes.</p>

<p><strong>Data Warehouse Features:</strong></p>

<ul>
  <li>Stores historical data in relational tables with an optimized schema, which enables the data analysis process</li>
  <li>Provides SQL support to query the data</li>
  <li>It can integrate external resources like CSV and parquet files that are stored on Data Lakes as external tables</li>
  <li>The system is designed to host and serve Big Data scenarios. It is not meant to be used as a transactional system</li>
  <li>Storage is more expensive</li>
  <li>Offloads archived data to Data Lakes</li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-analytical-data.png" alt="Data Engineering Process Fundamentals - Analytical Data Store" /></p>

<h3 id="discovery---data-analysis">Discovery - Data Analysis</h3>

<p>During the discovery phase of a Data Engineering Process, we look to identify and clearly document a problem statement, which helps us have an understanding of what we are trying to solve. We also look at our analytical approach to make observations about at the data, its structure and source. This leads us into defining the requirements for the project, so we can define the scope, design and architecture of the solution.</p>

<ul>
  <li>Download sample data files</li>
  <li>Run experiments to make observations</li>
  <li>Write Python scripts using VS Code or Jupyter Notebooks</li>
  <li>Transform the data with Pandas</li>
  <li>Make charts with Plotly</li>
  <li>Document the requirements</li>
</ul>

<p><img src="../../assets/2023/ozkary-data-engineering-process-jupyter-read-file.png" alt="Data Engineering Process Fundamentals - Data Analysis and discovery" /></p>

<h3 id="design-and-planning">Design and Planning</h3>

<p>The design and planning phase of a data engineering project is crucial for laying out the foundation of a successful system. It involves defining the system architecture, designing data pipelines, implementing source control practices, ensuring continuous integration and deployment (CI/CD), and leveraging tools like Docker and Terraform for infrastructure automation.</p>

<ul>
  <li>Use GitHub for code repo and for CI/CD actions</li>
  <li>Use Terraform is an Infrastructure as Code (IaC) tool that enables us to manage cloud resources across multiple cloud providers</li>
  <li>Use Docker containers to run the code and manage its dependencies</li>
</ul>

<p><img src="../../assets/2023/ozkary-data-engineering-design-terraform-docker.png" alt="Data Engineering Process Fundamentals - Design and Planning" /></p>

<h3 id="data-lake---pipeline-and-orchestration">Data Lake - Pipeline and Orchestration</h3>

<p>A data pipeline is basically a workflow of tasks that can be executed in Docker containers. The execution, scheduling, managing and monitoring of the pipeline is referred to as orchestration. In order to support the operations of the pipeline and its orchestration, we need to provision a VM and data lake, and monitor cloud resources.</p>

<ul>
  <li>This can be code-centric, leveraging languages like Python</li>
  <li>Or a low-code approach, utilizing tools such as Azure Data Factory, which provides a turn-key solution</li>
  <li>Monitor services enable us to track telemetry data</li>
  <li>Docker Hub, GitHub can be used for the CI/CD process</li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-data-pipeline.png" alt="Data Engineering Process Fundamentals - Data Lake - Data Pipeline and Orchestration" /></p>

<h3 id="data-warehouse---design-and-implementation">Data Warehouse - Design and Implementation</h3>

<p>In the design phase, we lay the groundwork by defining the database system, schema model, and technology stack required to support the data warehouse’s implementation and operations. In the implementation phase, we focus on converting conceptual data models into a functional system. By creating concrete structures like dimension and fact tables and performing data transformation tasks, including data cleansing, integration, and scheduled batch loading, we ensure that raw data is processed and unified for analysis. Create a repeatable and extendable process.</p>

<p><img src="../../assets/2023/ozkary-data-engineering-process-data-warehouse-design.png" alt="Data Engineering Process Fundamentals - Data Warehouse Design and Implementation" /></p>

<h3 id="data-warehouse---data-analysis">Data Warehouse - Data Analysis</h3>

<p>Data analysis is the practice of exploring data and understanding its meaning. It involves activities that can help us achieve a specific goal, such as identifying data dimensions and measures, as well as data analysis to identify outliers, trends, and distributions.</p>

<ul>
  <li>We can accomplish these activities by writing code using Python and Pandas, SQL, Visual Studio Code or Jupyter Notebooks.</li>
  <li>What’s more, we can use libraries, such as Plotly, to generate some visuals to further analyze data and create prototypes.</li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-fundamentals-data-analysis-code.png" alt="Data Engineering Process Fundamentals - Data Analysis" /></p>

<h3 id="data-analysis-and-visualization">Data Analysis and Visualization</h3>

<p>Data visualization is a powerful tool that takes the insights derived from data analysis and presents them in a visual format. While tables with numbers on a report provide raw information, visualizations allow us to grasp complex relationships and trends at a glance.</p>

<ul>
  <li>Dashboards, in particular, bring together various visual components like charts, graphs, and scorecards into a unified interface that can help us tell a story</li>
  <li>Use tools like PowerBI, Looker, Tableau to model the data and create enterprise level visualizations</li>
</ul>

<p><img src="../../assets/2023/ozkary-data-engineering-process-data-analysis-visualization-dashboard.png" alt="Data Engineering Process Fundamentals - Data Visualization" /></p>

<h3 id="conclusion">Conclusion</h3>

<p>Both data lakes and data warehouses are essential components of a data engineering project. The primary function of a data lake is to store large amounts of operational data in its raw format, serving as a staging area for analytical processes. In contrast, a data warehouse acts as a centralized repository for information, enabling engineers to transform, process, and store extensive data. This allows the analytical team to utilize coding languages like Python and tools such as Jupyter Notebooks, as well as low-code platforms like Looker Studio and Power BI, to create enterprise-quality dashboards for the organization.</p>

<p><strong>Upcoming Talks:</strong></p>

<p>Join us for subsequent sessions in our Data Engineering Process Fundamentals series, where we will delve deeper into specific facets of data engineering, exploring topics such as data modeling, pipelines, and best practices in data governance.</p>

<p>This presentation is based on the book, <a href="https://www.amazon.com/Data-Engineering-Process-Fundamentals-Hands/dp/B0CV7TPSNB">Data Engineering Process Fundamentals</a>, which provides a more comprehensive guide to the topics we’ll cover. You can find all the sample code and datasets used in this presentation on our popular GitHub repository <a href="https://github.com/ozkary/data-engineering-mta-turnstile">Introduction to Data Engineering Process Fundamentals</a>.</p>

<p>Thanks for reading! 😊 If you enjoyed this post and would like to stay updated with our latest content, don’t forget to follow us. Join our community and be the first to know about new articles, exclusive insights, and more!</p>

<ul>
  <li><a href="https://gdg.community.dev/gdg-broward-county-fl/">Google Developer Group</a></li>
  <li><a href="https://github.com/ozkary">GitHub</a></li>
  <li><a href="https://x.com/ozkary">Twitter</a></li>
  <li><a href="https://www.youtube.com/@ozkary">YouTube</a></li>
  <li><a href="https://bsky.app/profile/ozkary.bsky.social">BlueSky</a></li>
</ul>

<p>👍 Originally published by <a href="https://www.ozkary.com">ozkary.com</a></p>]]></content><author><name>Oscar D. Garcia - Ozkary</name></author><category term="code" /><category term="cloud" /><category term="github" /><category term="data lake" /><category term="data warehouse" /><category term="vscode" /><category term="docker" /><summary type="html"><![CDATA[Data Lakes and Data Warehouses. We will explore their roles, differences, and how they collectively empower organizations to harness the true potential of their data.]]></summary></entry><entry><title type="html">From Blueprint to Build - The Design and Planning Phase in Data Engineering</title><link href="https://www.ozkary.dev/data-engineering-process-fundamentals-from-blue-print-to-build-design-planning-phase/" rel="alternate" type="text/html" title="From Blueprint to Build - The Design and Planning Phase in Data Engineering" /><published>2025-09-29T00:00:00-04:00</published><updated>2025-09-29T09:00:00-04:00</updated><id>https://www.ozkary.dev/data-engineering-process-fundamentals-from-blue-print-to-build-design-planning-phase</id><content type="html" xml:base="https://www.ozkary.dev/data-engineering-process-fundamentals-from-blue-print-to-build-design-planning-phase/"><![CDATA[<h1 id="overview">Overview</h1>

<p>The design and planning phase of a data engineering project is crucial for laying out the foundation of a successful and scalable solution. This phase ensures that the architecture is strategically aligned with business objectives, optimizes resource utilization, and mitigates potential risks.</p>

<p><img src="../../assets/2025/ozkary-design-and-planning-phase-data-engineering-process-fundamentals.png" alt="Data Engineering Process Fundamentals" title="Data Engineering Process Fundamentals" /></p>

<ul>
  <li>Follow this GitHub repo during the presentation: (Give it a star)</li>
</ul>

<blockquote>
  <p>👉 https://github.com/ozkary/data-engineering-mta-turnstile</p>
</blockquote>

<ul>
  <li>Read more information on my blog at:</li>
</ul>

<blockquote>
  <p>👉 https://www.ozkary.com/2023/03/data-engineering-process-fundamentals.html</p>
</blockquote>

<h2 id="youtube-video">YouTube Video</h2>

<iframe width="560" height="315" src="https://www.youtube.com/embed/BS2tFYPTcCo?si=__rZnSMaRMZdPvd9" title="Data Engineering Process Fundamentals - Design and Planning" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe>

<h3 id="video-agenda">Video Agenda</h3>

<p>In this session, we embark on the next chapter of our data journey, delving into the critical Design and Planning Phase. As we transition from discovery to design, we’ll unravel the intricacies of:</p>

<p>System Design and Architecture:</p>

<ul>
  <li>Understanding the foundational principles that shape a robust and scalable data system.</li>
</ul>

<p>Data Pipeline and Orchestration:</p>

<ul>
  <li>Uncovering the essentials of designing an efficient data pipeline and orchestrating seamless data flows.</li>
</ul>

<p>Source Control and Deployment:</p>

<ul>
  <li>Navigating the best practices for source control, versioning, and deployment strategies.</li>
</ul>

<p>CI/CD in Data Engineering:</p>

<ul>
  <li>Implementing Continuous Integration and Continuous Deployment (CI/CD) practices for agility and reliability.</li>
</ul>

<p>Docker Container and Docker Hub:</p>

<ul>
  <li>Harnessing the power of Docker containers and Docker Hub for containerized deployments.</li>
</ul>

<p>Cloud Infrastructure with IaC:</p>

<ul>
  <li>Exploring technologies for building out cloud infrastructure using Infrastructure as Code (IaC), ensuring efficiency and consistency.</li>
</ul>

<p><strong>Why Join:</strong></p>

<ul>
  <li>
    <p>Gain insights into designing scalable and efficient data systems.</p>
  </li>
  <li>
    <p>Learn best practices for cloud infrastructure and IaC.</p>
  </li>
  <li>
    <p>Discover the importance of data pipeline orchestration and source control.</p>
  </li>
  <li>
    <p>Explore the world of CI/CD in the context of data engineering.</p>
  </li>
  <li>
    <p>Unlock the potential of Docker containers for your data workflows.</p>
  </li>
</ul>

<p><strong>Some of the technologies that we will be covering:</strong></p>

<ul>
  <li>Cloud Infrastructure</li>
  <li>Data Pipelines</li>
  <li>GitHub and Actions</li>
  <li>VSCode</li>
  <li>Docker and Docker Hub</li>
  <li>Terraform</li>
</ul>

<h2 id="presentation">Presentation</h2>

<h3 id="data-engineering-overview">Data Engineering Overview</h3>

<p>A Data Engineering Process involves executing steps to understand the problem, scope, design, and architecture for creating a solution. This enables ongoing big data analysis using analytical and visualization tools.</p>

<h4 id="topics">Topics</h4>

<ul>
  <li>Importance of Design and Planning</li>
  <li>System Design and Architecture</li>
  <li>Data Pipeline and Orchestration</li>
  <li>Source Control and CI/CD</li>
  <li>Docker Containers</li>
  <li>Cloud Infrastructure with IaC</li>
</ul>

<p><strong>Follow this project: Give a star</strong></p>

<blockquote>
  <p>👉 <a href="//github.com/ozkary/data-engineering-mta-turnstile">Data Engineering Process Fundamentals</a></p>
</blockquote>

<h3 id="importance-of-design-and-planning">Importance of Design and Planning</h3>

<p>The design and planning phase of a data engineering project is crucial for laying out the foundation of a successful and scalable solution. This phase ensures that the architecture is strategically aligned with business objectives, optimizes resource utilization, and mitigates potential risks.</p>

<h4 id="foundational-areas">Foundational Areas</h4>

<ul>
  <li>Designing the data pipeline and technology specifications like flows, coding language, data governance and tools</li>
  <li>Define the system architecture like cloud services for scalability, data platform</li>
  <li>Source control and deployment automation with CI/CD</li>
  <li>Using Docker containers for environment isolation to avoid deployment issues</li>
  <li>Infrastructure automation with Terraform or cloud CLI tools</li>
  <li>System monitor, notification and recovery</li>
</ul>

<p><img src="../../assets/2023/ozkary-data-engineering-process-design-planning.png" alt="Data Engineering Process Fundamentals - Design and Planning" title="Data Engineering Process Fundamentals - Design and Planning" /></p>

<h3 id="system-design-and-architecture">System Design and Architecture</h3>

<p>In a system design, we need to clearly define the different technologies that should be used for each area of the solution. It includes the high-level system architecture, which defines the different components and their integration.</p>

<ul>
  <li>
    <p>The <strong>design</strong> outlines the technical solution, including system architecture, data integration, flow orchestration, storage platforms, and data processing tools. It focuses on defining technologies for each component to ensure a cohesive and efficient solution.</p>
  </li>
  <li>
    <p>A <strong>system architecture</strong> is a critical high-level design encompassing various components such as data sources, ingestion resources, workflow orchestration, storage, transformation services, continuous ingestion, validation mechanisms, and analytics tools.</p>
  </li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-architecture-stream.png" alt="Data Engineering Process Fundamentals - System Architecture" title="Data Engineering Process Fundamentals - System Architecture" /></p>

<h3 id="data-pipeline-and-orchestration">Data Pipeline and Orchestration</h3>

<p>A data pipeline is basically a workflow of tasks that can be executed in Docker containers. The execution, scheduling, managing and monitoring of the pipeline is referred to as orchestration. In order to support the operations of the pipeline and its orchestration, we need to provision a VM and data lake, and monitor cloud resources.</p>

<ul>
  <li>This can be code-centric, leveraging languages like Python, SQL</li>
  <li>Or a low-code approach, utilizing tools such as Azure Data Factory, which provides a turn-key solution</li>
  <li>Monitor services enable us to track telemetry data to support operational requirements</li>
  <li>Docker Hub, GitHub can be used for the CI/CD process and deployed our code-centric solutions</li>
  <li>Scheduling, recovering from failures and dashboards are essentials for orchestration</li>
  <li>Low-code solutions , like data factory, can also be used</li>
</ul>

<p><img src="../../assets/2023/ozkary-data-engineering-process-pipeline-orchestration-architecture.png" alt="Data Engineering Process Fundamentals - Data Pipeline" title="Data Engineering Process Fundamentals - Data Pipeline" /></p>

<h3 id="source-control---cicd">Source Control - CI/CD</h3>

<p>Implementing source control practices alongside Continuous Integration and Continuous Delivery (CI/CD) pipelines is vital for facilitating agile development. This ensures efficient collaboration, change tracking, and seamless code deployment, crucial for addressing ongoing feature changes, bug fixes, and new environment deployments.</p>

<ul>
  <li>Systems like Git facilitates effective code and configuration file management, enabling collaboration and change tracking.</li>
  <li>Platforms such as GitHub enhance collaboration by providing a remote repository for sharing code.</li>
  <li>CI involves integrating code changes into a central repository, followed by automated build and test processes to validate changes and provide feedback.</li>
  <li>CD automates the deployment of code builds to various environments, such as staging and production, streamlining the release process and ensuring consistency across environments.</li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-ci-cd.png" alt="Data Engineering Process Fundamentals - GitHub CI/CD" title="Data Engineering Process Fundamentals - GitHub CI/CD" /></p>

<h3 id="docker-container-and-docker-hub">Docker Container and Docker Hub</h3>

<p>Docker proves invaluable for our data pipelines by providing self-contained environments with all necessary dependencies. With Docker Hub, we can effortlessly distribute pipeline images, facilitating swift and reliable provisioning of new environments.</p>

<ul>
  <li>Docker containers streamline the deployment process by encapsulating application and dependency configurations, reducing runtime errors.</li>
  <li>Containerizing data pipelines ensures reliability and portability by packaging all necessary components within a single container image.</li>
  <li>Docker Hub serves as a centralized container registry, enabling seamless image storage and distribution for streamlined environment provisioning and scalability.</li>
</ul>

<p><img src="../../assets/2023/ozkary-data-engineering-design-terraform-docker.png" alt="Data Engineering Process Fundamentals - Docker" title="Data Engineering Process Fundamentals - Docker" /></p>

<h3 id="cloud-infrastructure-with-iac">Cloud Infrastructure with IaC</h3>

<p>Infrastructure automation is crucial for maintaining consistency, scalability, and reliability across environments. By defining infrastructure as code (IaC), organizations can efficiently provision and modify cloud resources, mitigating manual errors.</p>

<ul>
  <li>Define infrastructure configurations as code, ensuring consistency across environments.</li>
  <li>Easily scale resources up or down to meet changing demands with code-defined infrastructure.</li>
  <li>Reduce manual errors and ensure reproducibility by automating resource provisioning and management.</li>
  <li>Track infrastructure changes under version control, enabling collaboration and ensuring auditability.</li>
  <li>Track infrastructure state, allowing for precise updates and minimizing drift between desired and actual configurations.</li>
</ul>

<p><img src="../../assets/2023/ozkary-data-engineering-terraform.png" alt="Data Engineering Process Fundamentals - Terraform" title="Data Engineering Process Fundamentals - Terraform" /></p>

<h2 id="summary">Summary</h2>

<p>The design and planning phase of a data engineering project sets the stage for success. From designing the system architecture and data pipelines to implementing source control, CI/CD, Docker, and infrastructure automation with Terraform, every aspect contributes to efficient and reliable deployment. Infrastructure automation, in particular, plays a critical role by simplifying provisioning of cloud resources, ensuring consistency, and enabling scalability, ultimately leading to a robust and manageable data engineering system.</p>

<p><strong>Upcoming Talks:</strong></p>

<p>Join us for subsequent sessions in our Data Engineering Process Fundamentals series, where we will delve deeper into specific facets of data engineering, exploring topics such as data modeling, pipelines, and best practices in data governance.</p>

<p>This presentation is based on the book, <a href="https://www.amazon.com/Data-Engineering-Process-Fundamentals-Hands/dp/B0CV7TPSNB">Data Engineering Process Fundamentals</a>, which provides a more comprehensive guide to the topics we’ll cover. You can find all the sample code and datasets used in this presentation on our popular GitHub repository <a href="https://github.com/ozkary/data-engineering-mta-turnstile">Introduction to Data Engineering Process Fundamentals</a>.</p>

<p>Thanks for reading! 😊 If you enjoyed this post and would like to stay updated with our latest content, don’t forget to follow us. Join our community and be the first to know about new articles, exclusive insights, and more!</p>

<ul>
  <li><a href="https://gdg.community.dev/gdg-broward-county-fl/">Google Developer Group</a></li>
  <li><a href="https://github.com/ozkary">GitHub</a></li>
  <li><a href="https://x.com/ozkary">Twitter</a></li>
  <li><a href="https://www.youtube.com/@ozkary">YouTube</a></li>
  <li><a href="https://bsky.app/profile/ozkary.bsky.social">BlueSky</a></li>
</ul>

<p>👍 Originally published by <a href="https://www.ozkary.com">ozkary.com</a></p>]]></content><author><name>Oscar D. Garcia - Ozkary</name></author><category term="code" /><category term="cloud" /><category term="github" /><category term="vscode" /><category term="docker" /><category term="terraform" /><category term="data lake" /><summary type="html"><![CDATA[The design and planning phase of a data engineering project is crucial for laying out the foundation of a successful and scalable solution. This phase ensures that the architecture is strategically aligned with business objectives, optimizes resource utilization, and mitigates potential risks.]]></summary></entry><entry><title type="html">From Raw Data to Roadmap: The Discovery Phase in Data Engineering Process Fundamentals</title><link href="https://www.ozkary.dev/data-engineering-process-fundamentals-from-raw-data-to-roadmap-the-discovery-phase/" rel="alternate" type="text/html" title="From Raw Data to Roadmap: The Discovery Phase in Data Engineering Process Fundamentals" /><published>2025-08-27T00:00:00-04:00</published><updated>2025-08-27T09:00:00-04:00</updated><id>https://www.ozkary.dev/data-engineering-process-fundamentals-from-raw-data-to-roadmap-the-discovery-phase</id><content type="html" xml:base="https://www.ozkary.dev/data-engineering-process-fundamentals-from-raw-data-to-roadmap-the-discovery-phase/"><![CDATA[<h1 id="overview">Overview</h1>

<p>The discovery process involves identifying the problem, analyzing data sources, defining project requirements, establishing the project scope, and designing an effective architecture to address the identified challenges.</p>

<p>In this session, we will delve into the essential building blocks of data engineering, placing a spotlight on the discovery process. From framing the problem statement to navigating the intricacies of exploratory data analysis (EDA) using Python, VSCode, Jupyter Notebooks, and GitHub, you’ll gain a solid understanding of the fundamental aspects that drive effective data engineering projects.</p>

<blockquote>
  <p>DevFest Series
Data Engineering Process Fundamentals Series</p>
</blockquote>

<p><img src="../../assets/2025/ozkary-data-engineering-process-fundamentals-from-raw-data-to-roadmap-discovery-phase.png" alt="From Raw Data to Roadmap: The Discovery Phase in Data Engineering - Data Engineering Process Fundamentals" title="From Raw Data to Roadmap: The Discovery Phase in Data Engineering - Data Engineering Process Fundamentals" /></p>

<ul>
  <li>Follow this GitHub repo during the presentation: (Give it a star)</li>
</ul>

<blockquote>
  <p>👉 <a href="https://github.com/ozkary/data-engineering-mta-turnstile">GitHub Repo</a></p>
</blockquote>

<p>Jupyter Notebook</p>

<blockquote>
  <p>👉 <a href="https://github.com/ozkary/data-engineering-mta-turnstile/blob/main/Step1-Discovery/mta_discovery.ipynb">Jupyter Notebook</a></p>
</blockquote>

<ul>
  <li>Data engineering Series:</li>
</ul>

<blockquote>
  <p>👉 <a href="https://www.ozkary.com/2023/03/data-engineering-process-fundamentals.html">Blog Series</a></p>
</blockquote>

<blockquote>
  <p>👉 <a href="https://www.amazon.com/Data-Engineering-Process-Fundamentals-Hands/dp/B0CV7TPSNB">Data Engineering Book on Amazon</a></p>
</blockquote>

<h2 id="youtube-video">YouTube Video</h2>

<iframe width="560" height="315" src="https://www.youtube.com/embed/UVS7A_3CGlU?si=rX_uwaTrAciyhmGa" title="From Raw Data to Roadmap: The Discovery Phase in Data Engineering - Data Engineering Process Fundamentals" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe>

<h3 id="video-agenda">Video Agenda</h3>

<p>In this session, we will delve into the essential building blocks of data engineering, placing a spotlight on the discovery process. From framing the problem statement to navigating the intricacies of exploratory data analysis (EDA), data modeling using Python, VS Code, Jupyter Notebooks, SQL, and GitHub, you’ll gain a solid understanding of the fundamental aspects that drive effective data engineering projects.</p>

<ol>
  <li>
    <p>Introduction:</p>

    <ul>
      <li>The “Why”: We’ll discuss why understanding your data upfront is crucial for success.</li>
      <li>The Problem: We’ll introduce a real-world problem that will guide our exploration.</li>
    </ul>
  </li>
  <li>
    <p>Data Loading and Preparation:</p>

    <ul>
      <li>Loading: We’ll demonstrate how to efficiently load data from an online source directly into our workspace.</li>
      <li>Structuring: We’ll prepare the loaded data for analysis, making it easy to work with.</li>
    </ul>
  </li>
  <li>
    <p>Exploratory Data Analysis (EDA):</p>

    <ul>
      <li>First Look: We’ll learn how to quickly generate and interpret summary statistics for our data.</li>
      <li>The Story: We’ll use these statistics to understand the data’s characteristics and identify any red flags or anomalies.</li>
    </ul>
  </li>
  <li>
    <p>Data Cleaning and Modeling:</p>

    <ul>
      <li>Cleaning: We’ll identify and handle common data issues like missing values and inconsistencies.</li>
      <li>Modeling: We’ll organize our data into separate tables for dimensions (descriptive attributes) and facts (measurable values).</li>
    </ul>
  </li>
  <li>
    <p>Visualization and Real-World Application:</p>

    <ul>
      <li>Bringing it to Life: We’ll create charts to visualize the data and find patterns.</li>
      <li>Solving the Problem: We’ll apply the insights gained to address our original problem and discuss practical solutions.</li>
    </ul>
  </li>
</ol>

<p>Key Takeaways:</p>

<ul>
  <li>Mastery of the foundational aspects of data engineering.</li>
  <li>Hands-on experience with EDA techniques, emphasizing the discovery phase.</li>
  <li>Appreciation for the value of a code-centric approach in the data engineering discovery process.</li>
</ul>

<p><strong>Upcoming Talks:</strong></p>

<p>Join us for subsequent sessions in our Data Engineering Process Fundamentals series, where we will delve deeper into specific facets of data engineering, exploring topics such as data modeling, pipelines, and best practices in data governance.</p>

<p>This presentation is based on the book, “Data Engineering Process Fundamentals,” which provides a more comprehensive guide to the topics we’ll cover. You can find all the sample code and datasets used in this presentation on our popular GitHub repository.</p>

<h2 id="presentation">Presentation</h2>

<h3 id="data-engineering-overview">Data Engineering Overview</h3>

<p>A Data Engineering Process involves executing steps to understand the problem, scope, design, and architecture for creating a solution. This enables ongoing big data analysis using analytical and visualization tools.</p>

<h4 id="topics">Topics</h4>

<ul>
  <li>Importance of the Discovery Process</li>
  <li>Setting the Stage - Technologies</li>
  <li>Exploratory Data Analysis (EDA)</li>
  <li>Code-Centric Approach</li>
  <li>Version Control</li>
  <li>Real-World Use Case</li>
</ul>

<p><strong>Follow this project: Give a star</strong></p>
<blockquote>
  <p>👉 <a href="//github.com/ozkary/data-engineering-mta-turnstile">Data Engineering Process Fundamentals</a></p>
</blockquote>

<h3 id="importance-of-the-discovery-process">Importance of the Discovery Process</h3>

<p>The discovery process involves identifying the problem, analyzing data sources, defining project requirements, establishing the project scope, and designing an effective architecture to address the identified challenges.</p>

<ul>
  <li>Clearly document the problem statement to understand the challenges the project aims to address.</li>
  <li>Make observations about the data, its structure, and sources during the discovery process.</li>
  <li>Define project requirements based on the observations, enabling the team to understand the scope and goals.</li>
  <li>Clearly outline the scope of the project, ensuring a focused and well-defined set of objectives.</li>
  <li>Use insights from the discovery phase to inform the design of the solution, including data architecture.</li>
  <li>Develop a robust project architecture that aligns with the defined requirements and scope.</li>
</ul>

<p><img src="../../assets/2023/ozkary-data-engineering-process-discovery.png" alt="Data Engineering Process Fundamentals - Discovery Process" title="Data Engineering Process Fundamentals - Discovery Process" /></p>

<h3 id="setting-the-stage---technologies">Setting the Stage - Technologies</h3>

<p>To set the stage, we need to identify and select the tools that can facilitate the analysis and documentation of the data. Here are key technologies that play a crucial role in this stage:</p>

<ul>
  <li><strong>Python:</strong> A versatile programming language with rich libraries for data manipulation, analysis, and scripting.</li>
</ul>

<p><strong>Use Cases:</strong> Data download, cleaning, exploration, and scripting for automation.</p>

<ul>
  <li><strong>Jupyter Notebooks:</strong> An interactive tool for creating and sharing documents containing live code, visualizations, and narrative text.</li>
</ul>

<p><strong>Use Cases:</strong> Exploratory data analysis, documentation, and code collaboration.</p>

<ul>
  <li><strong>Visual Studio Code:</strong> A lightweight, extensible code editor with powerful features for source code editing and debugging.</li>
</ul>

<p><strong>Use Cases:</strong> Writing and debugging code, integrating with version control systems like GitHub.</p>

<ul>
  <li><strong>SQL (Structured Query Language):</strong> A domain-specific language for managing and manipulating relational databases.</li>
</ul>

<p><strong>Use Cases:</strong> Querying databases, data extraction, and transformation.</p>

<p><img src="../../assets/2024/ozkary-data-engineering-process-discovery-tools.png" alt="Data Engineering Process Fundamentals - Discovery Tools" title="Data Engineering Process Fundamentals - Discovery Tools" /></p>

<h3 id="exploratory-data-analysis-eda">Exploratory Data Analysis (EDA)</h3>

<p>EDA is our go-to method for downloading, analyzing, understanding and documenting the intricacies of the datasets. It’s like peeling back the layers of information to reveal the stories hidden within the data. Here’s what EDA is all about:</p>

<ul>
  <li>
    <p>EDA is the process of analyzing data to identify patterns, relationships, and anomalies, guiding the project’s direction.</p>
  </li>
  <li>
    <p>Python and Jupyter Notebook collaboratively empower us to download, describe, and transform data through live queries.</p>
  </li>
  <li>
    <p>Insights gained from EDA set the foundation for informed decision-making in subsequent data engineering steps.</p>
  </li>
  <li>
    <p>Code written on Jupyter Notebook can be exported and used as the starting point for components for the data pipeline and transformation services.</p>
  </li>
</ul>

<p><img src="../../assets/2023/ozkary-data-engineering-jupyter-pie-chart.png" alt="Data Engineering Process Fundamentals - Discovery Pie Chart" title="Data Engineering Process Fundamentals - Discovery Pie Chart" /></p>

<h3 id="code-centric-approach">Code-Centric Approach</h3>

<p>A code-centric approach, using programming languages and tools in EDA, helps us understand the coding methodology for building data structures, defining schemas, and establishing relationships. This robust understanding seamlessly guides project implementation.</p>

<ul>
  <li>
    <p>Code delves deep into data intricacies, revealing integration and transformation challenges often unclear with visual tools.</p>
  </li>
  <li>
    <p>Using code taps into Pandas and Numpy libraries, empowering robust manipulation of data frames, establishment of loading schemas, and addressing transformation needs.</p>
  </li>
  <li>
    <p>Code-centricity enables sophisticated analyses, covering aggregation, distribution, and in-depth examinations of the data.</p>
  </li>
  <li>
    <p>While visual tools have their merits, a code-centric approach excels in hands-on, detailed data exploration, uncovering subtle nuances and potential challenges.</p>
  </li>
</ul>

<p><img src="../../assets/2023/ozkary-data-engineering-process-jupyter-observations.png" alt="Data Engineering Process Fundamentals - Discovery Pie Chart" title="Data Engineering Process Fundamentals - Discovery Pie Chart" /></p>

<h3 id="version-control">Version Control</h3>

<p>Using a tool like GitHub is essential for effective version control and collaboration in our discovery process. GitHub enables us to track our exploratory code and Jupyter Notebooks, fostering collaboration, documentation, and comprehensive project management. Here’s how GitHub enhances our process:</p>

<ul>
  <li>
    <p><strong>Centralized Tracking:</strong> GitHub centralizes tracking and managing our exploratory code and Jupyter Notebooks, ensuring a transparent and organized record of our data exploration.</p>
  </li>
  <li>
    <p><strong>Sharing:</strong> Easily share code and Notebooks with team members on GitHub, fostering seamless collaboration and knowledge sharing.</p>
  </li>
  <li>
    <p><strong>Documentation:</strong> GitHub supports Markdown, enabling comprehensive documentation of processes, findings, and insights within the same repository.</p>
  </li>
  <li>
    <p><strong>Project Management:</strong> GitHub acts as a project management hub, facilitating CI/CD pipeline integration for smooth and automated delivery of data engineering projects.</p>
  </li>
</ul>

<p><img src="../../assets/2024/ozkary-data-engineering-process-problem-statement.png" alt="Data Engineering Process Fundamentals - Discovery Problem Statement" title="Data Engineering Process Fundamentals - Discovery Problem Statement" /></p>

<h2 id="summary-the-power-of-discovery">Summary: The Power of Discovery</h2>

<p>By mastering the discovery phase, you lay a strong foundation for successful data engineering projects. A thorough understanding of your data is essential for extracting meaningful insights.</p>

<ul>
  <li><strong>Understanding Your Data:</strong> The discovery phase is crucial for understanding your data’s characteristics, quality, and potential.</li>
  <li><strong>Exploratory Data Analysis (EDA):</strong> Use techniques to uncover patterns, trends, and anomalies.</li>
  <li><strong>Data Profiling:</strong> Assess data quality, identify missing values, and understand data distributions.</li>
  <li><strong>Data Cleaning:</strong> Address data inconsistencies and errors to ensure data accuracy.</li>
  <li><strong>Domain Knowledge:</strong> Leverage domain expertise to guide data exploration and interpretation.</li>
  <li><strong>Setting the Stage:</strong> Choose the right language and tools for efficient data exploration and analysis.</li>
</ul>

<p>The data engineering discovery process involves defining the problem statement, gathering requirements, and determining the scope of work. It also includes a data analysis exercise utilizing Python and Jupyter Notebooks or other tools to extract valuable insights from the data. These steps collectively lay the foundation for successful data engineering endeavors.</p>

<p>Thanks for reading! 😊 If you enjoyed this post and would like to stay updated with our latest content, don’t forget to follow us. Join our community and be the first to know about new articles, exclusive insights, and more!</p>

<ul>
  <li><a href="https://gdg.community.dev/gdg-broward-county-fl/">Google Developer Group</a></li>
  <li><a href="https://github.com/ozkary">GitHub</a></li>
  <li><a href="https://x.com/ozkary">Twitter</a></li>
  <li><a href="https://www.youtube.com/@ozkary">YouTube</a></li>
  <li><a href="https://bsky.app/profile/ozkary.bsky.social">BlueSky</a></li>
</ul>

<p>👍 Originally published by <a href="https://www.ozkary.com">ozkary.com</a></p>]]></content><author><name>Oscar D. Garcia - Ozkary</name></author><category term="code" /><category term="cloud" /><category term="github" /><category term="vscode" /><category term="docker" /><summary type="html"><![CDATA[we will delve into the essential building blocks of data engineering, placing a spotlight on the discovery process. From framing the problem statement to navigating the intricacies of exploratory data analysis (EDA) using Python.]]></summary></entry><entry><title type="html">Discover AI Agent: A Primer’s Guide July 2025</title><link href="https://www.ozkary.dev/autonomous-ai-agent-primer-guide-july-2025/" rel="alternate" type="text/html" title="Discover AI Agent: A Primer’s Guide July 2025" /><published>2025-07-23T00:00:00-04:00</published><updated>2025-07-23T09:00:00-04:00</updated><id>https://www.ozkary.dev/autonomous-ai-agent-primer-guide-july-2025</id><content type="html" xml:base="https://www.ozkary.dev/autonomous-ai-agent-primer-guide-july-2025/"><![CDATA[<h1 id="overview">Overview</h1>

<p>What’s the AI agent mystique? Are they just chatbots with automation? What makes them different—and why does it matter?</p>

<p>This presentation breaks it down from the ground up. We’ll explore what truly sets AI agents apart—how they perceive, reason, and act with autonomy across industries ranging from healthcare to retail to logistics. You’ll walk away with a clear understanding of what an agent is, how it works, and what it takes to build one.</p>

<p>Whether you’re a developer, strategist, or simply curious, this session is your entry point to one of the most transformative ideas in AI today.</p>

<p><img src="../../assets/2025/ozkary-autonomous-ai-agents-primer-guide.jpg" alt="Autonomous AI Agents a Primer's Guide" /></p>

<blockquote>
  <p>#BuildWithAI Series
July 2025 Presentation</p>
</blockquote>

<h2 id="youtube-video">YouTube Video</h2>

<iframe width="560" height="315" src="https://www.youtube.com/embed/3tOK4nvEjOE?si=AW-73vcD6ids55mi" title="Autonomous AI Agent: A Primer's Guide" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>

<h2 id="github-repo">GitHub Repo</h2>

<p><a href="https://github.com/ozkary/ai-engineering/tree/main/ai-agents"><img src="https://img.shields.io/badge/GitHub-ai--agents-blue?logo=github" alt="Autonomous AI Agent - GitHub" /></a></p>

<h3 id="video-agenda">Video Agenda:</h3>

<ul>
  <li>What is an AI Agent?</li>
  <li>Autonomy Advantage: How AI Agents Go Beyond Automation</li>
  <li>The Agent’s Secret Power</li>
  <li>Model Context Protocol (MCP): The Key to Tool Integration</li>
  <li>How Does an Agent Talk MCP?</li>
  <li>Benefits of MCP for AI Agents</li>
  <li>Shape Agent Behavior Through Prompting</li>
</ul>

<h2 id="presentation">Presentation</h2>

<h3 id="what-is-an-ai-agent">What is an AI Agent?</h3>

<p>An AI agent is a software robot that observes what’s happening, figures out what to do, and then does it—all without a human needing to guide every step.</p>

<p><strong>Manufacturing Setting:</strong></p>

<ul>
  <li>Monitors sensor data in real time, comparing each new reading against control limits and recent patterns to detect drift, anomalies, or rule violations.</li>
  <li>Decides what needs to happen next—whether that’s pausing production, flagging maintenance, or adjusting inputs to keep the process stable.</li>
  <li>Acts without waiting for instructions, logging the event, alerting staff, or triggering automated workflows across connected systems.</li>
</ul>

<blockquote>
  <p>“Now, you might wonder—how’s this different from just traditional automation?”</p>
</blockquote>

<p><img src="../../assets/2025/ozkary-autonomous-ai-agent-design.jpg" alt="Autonomous AI Agents a Primer's Guide Design" /></p>

<h2 id="autonomy-advantage-how-ai-agents-go-beyond-automation">Autonomy Advantage: How AI Agents Go Beyond Automation</h2>

<p>Unlike scripted automation, an AI agent brings autonomy—acting with awareness, judgment, and initiative. It doesn’t just execute commands—it thinks.</p>

<ul>
  <li>
    <p><strong>Perception</strong> Observes real-time data from sensors, machines, and systems—just like a human operator watching a dashboard—but at higher speed and scale.</p>
  </li>
  <li>
    <p><strong>Reasoning</strong> Analyzes trends and patterns from recent data (its reasoning window) to assess stability, detect anomalies, or anticipate breakdowns—just like an engineer interpreting a control chart.</p>
  </li>
  <li>
    <p><strong>Action</strong> Takes initiative by triggering responses: adjusting inputs, alerting staff, logging events, or even halting production—without waiting for permission.</p>
  </li>
</ul>

<blockquote>
  <p>But, what powers this autonomy?</p>
</blockquote>

<p><img src="../../assets/2025/ozkary-autonomous-ai-agent-perception-reasoning-action.jpg" alt="Autonomous AI Agents a Primer's Guide Design" /></p>

<h2 id="the-agents-secret-power">The Agent’s Secret Power</h2>

<p>An AI agent doesn’t just automate—it senses, thinks, and acts on its own. These core technologies are what give it autonomy.</p>

<p><strong>Manufacturing Setting:</strong></p>

<ul>
  <li>Perception Ingests real-time sensor data and stores recent readings in a reasoning window for short-term memory.</li>
  <li>Reasoning Uses an LLM (like Gemini) to analyze trends, detect rule violations, and interpret process behavior—beyond rigid logic.</li>
  <li>Action Executes commands using predefined tools via MCP—like notifying staff, triggering scripts, or calling APIs.</li>
</ul>

<blockquote>
  <p>Wait, what are MCP tools?</p>
</blockquote>

<p><img src="../../assets/2025/ozkary-autonomous-ai-agent-mcp-tools.jpg" alt="Autonomous AI Agents a Primer's Guide Design" /></p>

<h2 id="model-context-protocol-mcp-the-key-to-tool-integration">Model Context Protocol (MCP): The Key to Tool Integration</h2>

<p>MCP is a communication framework that lets AI agents use tools—like APIs, databases, or notifications—by expressing intent in structured language.</p>

<ul>
  <li>Triggering a Notification The agent says: @notify: supervisor_alert(“Vibration spike detected on motor_3A”) MCP delivers a formatted message via email, SMS, or system alert.</li>
</ul>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="s">POST /alerts/send</span>
<span class="na">Content-Type</span><span class="pi">:</span> <span class="s">application/json</span>

<span class="pi">{</span>
  <span class="s2">"</span><span class="s">recipient"</span><span class="pi">:</span> <span class="s2">"</span><span class="s">supervisor_team"</span><span class="pi">,</span>
  <span class="s2">"</span><span class="s">message"</span><span class="pi">:</span> <span class="s2">"</span><span class="s">Vibration</span><span class="nv"> </span><span class="s">spike</span><span class="nv"> </span><span class="s">detected</span><span class="nv"> </span><span class="s">on</span><span class="nv"> </span><span class="s">motor_3A"</span><span class="pi">,</span>
  <span class="s2">"</span><span class="s">priority"</span><span class="pi">:</span> <span class="s2">"</span><span class="s">high"</span>
<span class="pi">}</span>
</code></pre></div></div>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">tool</span><span class="pi">:</span> <span class="s">notify_supervisor</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">Sends an alert message to the assigned supervisor team</span>
<span class="na">parameters</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">message</span>
    <span class="na">type</span><span class="pi">:</span> <span class="s">string</span>
    <span class="na">required</span><span class="pi">:</span> <span class="no">true</span>
    <span class="na">description</span><span class="pi">:</span> <span class="s">The alert message to send</span>
<span class="na">example_call</span><span class="pi">:</span> <span class="s2">"</span><span class="s">@notify:</span><span class="nv"> </span><span class="s">supervisor_alert(</span><span class="se">\"</span><span class="s">Vibration</span><span class="nv"> </span><span class="s">spike</span><span class="nv"> </span><span class="s">detected</span><span class="nv"> </span><span class="s">on</span><span class="nv"> </span><span class="s">motor_3A</span><span class="se">\"</span><span class="s">)"</span>
<span class="na">execution</span><span class="pi">:</span>
  <span class="na">type</span><span class="pi">:</span> <span class="s">webhook</span>
  <span class="na">method</span><span class="pi">:</span> <span class="s">POST</span>
  <span class="na">endpoint</span><span class="pi">:</span> <span class="s">https://factory.opsys.com/alerts/send</span>
  <span class="na">payload_mapping</span><span class="pi">:</span>
    <span class="na">recipient</span><span class="pi">:</span> <span class="s2">"</span><span class="s">supervisor_team"</span>
    <span class="na">message</span><span class="pi">:</span> <span class="s2">"</span><span class="s">"</span>
    <span class="na">priority</span><span class="pi">:</span> <span class="s2">"</span><span class="s">high"</span>
</code></pre></div></div>

<h2 id="how-does-the-agent-understand-mcp">How Does the Agent Understand MCP?</h2>

<p>When an agent makes a decision, it doesn’t call a function directly—it <em>declares intent</em> using a structured phrase. MCP translates that intent into a real-world action by matching it to a predefined tool. Essentially, reading the tool metadata as a prompt.</p>

<p><strong>Agent says:</strong></p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@notify: supervisor_alert("Vibration spike detected on motor_3A")
</code></pre></div></div>

<p><strong>In Action:</strong></p>

<ul>
  <li><strong>Agent</strong> emits intent using MCP syntax, @notify: supervisor_alert(“Vibration spike detected on motor_3A”)</li>
  <li><strong>MCP</strong> matches the function name (<code class="language-plaintext highlighter-rouge">supervisor_alert</code>) to a registered tool.</li>
  <li><strong>Execution Engine</strong> constructs the proper HTTP request using metadata, endpoint URL, method, headers, authentication.</li>
  <li><strong>Action</strong> is performed: supervisor is notified via the external system.</li>
</ul>

<blockquote>
  <p>The agent just describes what it needed to happen. MCP handles the how.</p>
</blockquote>

<h2 id="benefits-of-mcp-for-ai-agents">Benefits of MCP for AI Agents</h2>

<p>MCP gives AI agents the flexibility and intelligence to grow beyond fixed automation—enabling them to explore, understand, and apply tools in dynamic environments.</p>

<ul>
  <li><em>*Dynamic Tool Discovery:</em>- Agents can learn about and use new tools without explicit programming.</li>
  <li><em>*Human-like Tool Usage:</em>- Agents leverage tools based on their “understanding” of the tool’s purpose and capabilities, similar to how a human learns to use a new application.</li>
  <li><em>*Enhanced Functionality &amp; Adaptability:</em>- Unlocks a vast ecosystem of capabilities for autonomous agents.</li>
</ul>

<blockquote>
  <p><em>To act effectively, agents also need character—a defined role, a point of view, a way to think.</em></p>
</blockquote>

<h2 id="shape-agent-behavior-through-prompting">Shape Agent Behavior Through Prompting</h2>

<p>Textual instructions or context provided to guide the agent’s behavior and reasoning. They are crucial for controlling and directing autonomous agents.</p>

<ul>
  <li>
    <p><strong>System Prompts</strong> Define the agent’s identity, role, tone, and reasoning strategy. This is its operating character—guiding how it thinks across all interactions. &gt; Example: “You are a manufacturing agent that monitors vibration data and applies SPC rules to detect risk.”</p>
  </li>
  <li>
    <p><strong>User/Agent Prompts</strong> Deliver instructions at the moment. These guide the agent’s short-term focus and task-specific reasoning. &gt; Example: “Analyze this new sample and let me know if we’re trending toward a shutdown.”</p>
  </li>
</ul>

<blockquote>
  <p>How do I get started?</p>
</blockquote>

<h2 id="getting-started-with-ai-agents-the-tech-stack">Getting Started with AI Agents: The Tech Stack</h2>

<p>To build your first AI agent, these tools offer a powerful foundation—though not the only options, they represent a well-integrated, production-ready ecosystem:</p>

<ul>
  <li>
    <p>LangChain: Core framework for integrating tools, memory, vector databases, and APIs. Think of it as the foundation that gives your agent capabilities.</p>
  </li>
  <li>
    <p>LangGraph Adds orchestration and state management by turning your LangChain components into reactive, stateful workflows—ideal for agents that need long-term memory and conditional behavior.</p>
  </li>
  <li>
    <p>LangSmith: Monitoring and evaluation suite to observe, debug, and improve your agents—see how prompts, memory, and tools interact across sessions.</p>
  </li>
  <li>
    <p>n8n: No-code orchestration platform that lets you deploy agents into real-world business systems—perfect for automation without touching code.</p>
  </li>
</ul>

<p><img src="../../assets/2025/ozkary-autonomous-ai-agent-langChain-LangGraph.jpg" alt="Autonomous AI Agents a Primer's Guide langChain LangGraph" /></p>

<p>Thanks for reading! 😊 If you enjoyed this post and would like to stay updated with our latest content, don’t forget to follow us. Join our community and be the first to know about new articles, exclusive insights, and more!</p>

<ul>
  <li><a href="https://gdg.community.dev/gdg-broward-county-fl/">Google Developer Group</a></li>
  <li><a href="https://github.com/ozkary">GitHub</a></li>
  <li><a href="https://x.com/ozkary">Twitter</a></li>
  <li><a href="https://www.youtube.com/@ozkary">YouTube</a></li>
  <li><a href="https://bsky.app/profile/ozkary.bsky.social">BlueSky</a></li>
</ul>

<p>👍 Originally published by <a href="https://www.ozkary.com">ozkary.com</a></p>]]></content><author><name>Oscar D. Garcia - Ozkary</name></author><category term="AI" /><category term="Model Context protocol" /><category term="Entrepreneurs" /><category term="AI Agents" /><category term="LangChain" /><category term="LangGraph" /><summary type="html"><![CDATA[Explore what truly sets AI agents apart—how they perceive, reason, and act with autonomy across industries ranging from healthcare to retail to logistics.]]></summary></entry><entry><title type="html">Autonomous AI Agent: A Primer’s Guide June 2025</title><link href="https://www.ozkary.dev/autonomous-ai-agent-primer-guide/" rel="alternate" type="text/html" title="Autonomous AI Agent: A Primer’s Guide June 2025" /><published>2025-06-25T00:00:00-04:00</published><updated>2025-06-25T09:00:00-04:00</updated><id>https://www.ozkary.dev/autonomous-ai-agent-primer-guide</id><content type="html" xml:base="https://www.ozkary.dev/autonomous-ai-agent-primer-guide/"><![CDATA[<h1 id="overview">Overview</h1>

<p>What’s the AI agent mystique? Are they just chatbots with automation? What makes them different—and why does it matter?</p>

<p>This presentation breaks it down from the ground up. We’ll explore what truly sets AI agents apart—how they perceive, reason, and act with autonomy across industries ranging from healthcare to retail to logistics. You’ll walk away with a clear understanding of what an agent is, how it works, and what it takes to build one.</p>

<p>Whether you’re a developer, strategist, or simply curious, this session is your entry point to one of the most transformative ideas in AI today.</p>

<p><img src="../../assets/2025/ozkary-autonomous-ai-agents-primer-guide.jpg" alt="Autonomous AI Agents a Primer's Guide" /></p>

<blockquote>
  <p>#BuildWithAI Series
July 2025 Presentation</p>
</blockquote>

<h2 id="youtube-video">YouTube Video</h2>

<iframe width="560" height="315" src="https://www.youtube.com/embed/taVukQ79poU?si=fjmGD_QDmR8vzmbi" title="Autonomous AI Agent: A Primer's Guide" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>

<h2 id="github-repo">GitHub Repo</h2>

<p><a href="https://github.com/ozkary/ai-engineering/tree/main/ai-agents"><img src="https://img.shields.io/badge/GitHub-ai--agents-blue?logo=github" alt="Autonomous AI Agent - GitHub" /></a></p>

<h3 id="video-agenda">Video Agenda:</h3>

<ul>
  <li>What is an AI Agent?</li>
  <li>Autonomy Advantage: How AI Agents Go Beyond Automation</li>
  <li>The Agent’s Secret Power</li>
  <li>Model Context Protocol (MCP): The Key to Tool Integration</li>
  <li>How Does an Agent Talk MCP?</li>
  <li>Benefits of MCP for AI Agents</li>
  <li>Shape Agent Behavior Through Prompting</li>
</ul>

<h2 id="presentation">Presentation</h2>

<h3 id="what-is-an-ai-agent">What is an AI Agent?</h3>

<p>An AI agent is a software robot that observes what’s happening, figures out what to do, and then does it—all without a human needing to guide every step.</p>

<p><strong>Manufacturing Setting:</strong></p>

<ul>
  <li>Monitors sensor data in real time, comparing each new reading against control limits and recent patterns to detect drift, anomalies, or rule violations.</li>
  <li>Decides what needs to happen next—whether that’s pausing production, flagging maintenance, or adjusting inputs to keep the process stable.</li>
  <li>Acts without waiting for instructions, logging the event, alerting staff, or triggering automated workflows across connected systems.</li>
</ul>

<blockquote>
  <p>“Now, you might wonder—how’s this different from just traditional automation?”</p>
</blockquote>

<p><img src="../../assets/2025/ozkary-autonomous-ai-agent-design.jpg" alt="Autonomous AI Agents a Primer's Guide Design" /></p>

<h2 id="autonomy-advantage-how-ai-agents-go-beyond-automation">Autonomy Advantage: How AI Agents Go Beyond Automation</h2>

<p>Unlike scripted automation, an AI agent brings autonomy—acting with awareness, judgment, and initiative. It doesn’t just execute commands—it thinks.</p>

<ul>
  <li>
    <p><strong>Perception</strong> Observes real-time data from sensors, machines, and systems—just like a human operator watching a dashboard—but at higher speed and scale.</p>
  </li>
  <li>
    <p><strong>Reasoning</strong> Analyzes trends and patterns from recent data (its reasoning window) to assess stability, detect anomalies, or anticipate breakdowns—just like an engineer interpreting a control chart.</p>
  </li>
  <li>
    <p><strong>Action</strong> Takes initiative by triggering responses: adjusting inputs, alerting staff, logging events, or even halting production—without waiting for permission.</p>
  </li>
</ul>

<blockquote>
  <p>But, what powers this autonomy?</p>
</blockquote>

<p><img src="../../assets/2025/ozkary-autonomous-ai-agent-perception-reasoning-action.jpg" alt="Autonomous AI Agents a Primer's Guide Design" /></p>

<h2 id="the-agents-secret-power">The Agent’s Secret Power</h2>

<p>An AI agent doesn’t just automate—it senses, thinks, and acts on its own. These core technologies are what give it autonomy.</p>

<p><strong>Manufacturing Setting:</strong></p>

<ul>
  <li>Perception Ingests real-time sensor data and stores recent readings in a reasoning window for short-term memory.</li>
  <li>Reasoning Uses an LLM (like Gemini) to analyze trends, detect rule violations, and interpret process behavior—beyond rigid logic.</li>
  <li>Action Executes commands using predefined tools via MCP—like notifying staff, triggering scripts, or calling APIs.</li>
</ul>

<blockquote>
  <p>Wait, what are MCP tools?</p>
</blockquote>

<p><img src="../../assets/2025/ozkary-autonomous-ai-agent-mcp-tools.jpg" alt="Autonomous AI Agents a Primer's Guide Design" /></p>

<h2 id="model-context-protocol-mcp-the-key-to-tool-integration">Model Context Protocol (MCP): The Key to Tool Integration</h2>

<p>MCP is a communication framework that lets AI agents use tools—like APIs, databases, or notifications—by expressing intent in structured language.</p>

<ul>
  <li>Triggering a Notification The agent says: @notify: supervisor_alert(“Vibration spike detected on motor_3A”) MCP delivers a formatted message via email, SMS, or system alert.</li>
</ul>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="s">POST /alerts/send</span>
<span class="na">Content-Type</span><span class="pi">:</span> <span class="s">application/json</span>

<span class="pi">{</span>
  <span class="s2">"</span><span class="s">recipient"</span><span class="pi">:</span> <span class="s2">"</span><span class="s">supervisor_team"</span><span class="pi">,</span>
  <span class="s2">"</span><span class="s">message"</span><span class="pi">:</span> <span class="s2">"</span><span class="s">Vibration</span><span class="nv"> </span><span class="s">spike</span><span class="nv"> </span><span class="s">detected</span><span class="nv"> </span><span class="s">on</span><span class="nv"> </span><span class="s">motor_3A"</span><span class="pi">,</span>
  <span class="s2">"</span><span class="s">priority"</span><span class="pi">:</span> <span class="s2">"</span><span class="s">high"</span>
<span class="pi">}</span>
</code></pre></div></div>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">tool</span><span class="pi">:</span> <span class="s">notify_supervisor</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">Sends an alert message to the assigned supervisor team</span>
<span class="na">parameters</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">message</span>
    <span class="na">type</span><span class="pi">:</span> <span class="s">string</span>
    <span class="na">required</span><span class="pi">:</span> <span class="no">true</span>
    <span class="na">description</span><span class="pi">:</span> <span class="s">The alert message to send</span>
<span class="na">example_call</span><span class="pi">:</span> <span class="s2">"</span><span class="s">@notify:</span><span class="nv"> </span><span class="s">supervisor_alert(</span><span class="se">\"</span><span class="s">Vibration</span><span class="nv"> </span><span class="s">spike</span><span class="nv"> </span><span class="s">detected</span><span class="nv"> </span><span class="s">on</span><span class="nv"> </span><span class="s">motor_3A</span><span class="se">\"</span><span class="s">)"</span>
<span class="na">execution</span><span class="pi">:</span>
  <span class="na">type</span><span class="pi">:</span> <span class="s">webhook</span>
  <span class="na">method</span><span class="pi">:</span> <span class="s">POST</span>
  <span class="na">endpoint</span><span class="pi">:</span> <span class="s">https://factory.opsys.com/alerts/send</span>
  <span class="na">payload_mapping</span><span class="pi">:</span>
    <span class="na">recipient</span><span class="pi">:</span> <span class="s2">"</span><span class="s">supervisor_team"</span>
    <span class="na">message</span><span class="pi">:</span> <span class="s2">"</span><span class="s">"</span>
    <span class="na">priority</span><span class="pi">:</span> <span class="s2">"</span><span class="s">high"</span>
</code></pre></div></div>

<h2 id="how-does-the-agent-understand-mcp">How Does the Agent Understand MCP?</h2>

<p>When an agent makes a decision, it doesn’t call a function directly—it <em>declares intent</em> using a structured phrase. MCP translates that intent into a real-world action by matching it to a predefined tool. Essentially, reading the tool metadata as a prompt.</p>

<p><strong>Agent says:</strong></p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@notify: supervisor_alert("Vibration spike detected on motor_3A")
</code></pre></div></div>

<p><strong>In Action:</strong></p>

<ul>
  <li><strong>Agent</strong> emits intent using MCP syntax, @notify: supervisor_alert(“Vibration spike detected on motor_3A”)</li>
  <li><strong>MCP</strong> matches the function name (<code class="language-plaintext highlighter-rouge">supervisor_alert</code>) to a registered tool.</li>
  <li><strong>Execution Engine</strong> constructs the proper HTTP request using metadata, endpoint URL, method, headers, authentication.</li>
  <li><strong>Action</strong> is performed: supervisor is notified via the external system.</li>
</ul>

<blockquote>
  <p>The agent just describes what it needed to happen. MCP handles the how.</p>
</blockquote>

<h2 id="benefits-of-mcp-for-ai-agents">Benefits of MCP for AI Agents</h2>

<p>MCP gives AI agents the flexibility and intelligence to grow beyond fixed automation—enabling them to explore, understand, and apply tools in dynamic environments.</p>

<ul>
  <li><em>*Dynamic Tool Discovery:</em>- Agents can learn about and use new tools without explicit programming.</li>
  <li><em>*Human-like Tool Usage:</em>- Agents leverage tools based on their “understanding” of the tool’s purpose and capabilities, similar to how a human learns to use a new application.</li>
  <li><em>*Enhanced Functionality &amp; Adaptability:</em>- Unlocks a vast ecosystem of capabilities for autonomous agents.</li>
</ul>

<blockquote>
  <p><em>To act effectively, agents also need character—a defined role, a point of view, a way to think.</em></p>
</blockquote>

<h2 id="shape-agent-behavior-through-prompting">Shape Agent Behavior Through Prompting</h2>

<p>Textual instructions or context provided to guide the agent’s behavior and reasoning. They are crucial for controlling and directing autonomous agents.</p>

<ul>
  <li>
    <p><strong>System Prompts</strong> Define the agent’s identity, role, tone, and reasoning strategy. This is its operating character—guiding how it thinks across all interactions. &gt; Example: “You are a manufacturing agent that monitors vibration data and applies SPC rules to detect risk.”</p>
  </li>
  <li>
    <p><strong>User/Agent Prompts</strong> Deliver instructions at the moment. These guide the agent’s short-term focus and task-specific reasoning. &gt; Example: “Analyze this new sample and let me know if we’re trending toward a shutdown.”</p>
  </li>
</ul>

<blockquote>
  <p>How do I get started?</p>
</blockquote>

<h2 id="getting-started-with-ai-agents-the-tech-stack">Getting Started with AI Agents: The Tech Stack</h2>

<p>To build your first AI agent, these tools offer a powerful foundation—though not the only options, they represent a well-integrated, production-ready ecosystem:</p>

<ul>
  <li>
    <p>LangChain: Core framework for integrating tools, memory, vector databases, and APIs. Think of it as the foundation that gives your agent capabilities.</p>
  </li>
  <li>
    <p>LangGraph Adds orchestration and state management by turning your LangChain components into reactive, stateful workflows—ideal for agents that need long-term memory and conditional behavior.</p>
  </li>
  <li>
    <p>LangSmith: Monitoring and evaluation suite to observe, debug, and improve your agents—see how prompts, memory, and tools interact across sessions.</p>
  </li>
  <li>
    <p>n8n: No-code orchestration platform that lets you deploy agents into real-world business systems—perfect for automation without touching code.</p>
  </li>
</ul>

<p><img src="../../assets/2025/ozkary-autonomous-ai-agent-langChain-LangGraph.jpg" alt="Autonomous AI Agents a Primer's Guide langChain LangGraph" /></p>

<p>Thanks for reading! 😊 If you enjoyed this post and would like to stay updated with our latest content, don’t forget to follow us. Join our community and be the first to know about new articles, exclusive insights, and more!</p>

<ul>
  <li><a href="https://gdg.community.dev/gdg-broward-county-fl/">Google Developer Group</a></li>
  <li><a href="https://github.com/ozkary">GitHub</a></li>
  <li><a href="https://x.com/ozkary">Twitter</a></li>
  <li><a href="https://www.youtube.com/@ozkary">YouTube</a></li>
  <li><a href="https://bsky.app/profile/ozkary.bsky.social">BlueSky</a></li>
</ul>

<p>👍 Originally published by <a href="https://www.ozkary.com">ozkary.com</a></p>]]></content><author><name>Oscar D. Garcia - Ozkary</name></author><category term="AI" /><category term="Model Context protocol" /><category term="Entrepreneurs" /><category term="AI Agents" /><category term="LangChain" /><category term="LangGraph" /><summary type="html"><![CDATA[What’s the AI agent mystique? Are they just chatbots with automation? What makes them different—and why does it matter?]]></summary></entry><entry><title type="html">Restore VS Code After Windows Updates Remove It</title><link href="https://www.ozkary.dev/restore-vscode-after-windows-update-remove-it/" rel="alternate" type="text/html" title="Restore VS Code After Windows Updates Remove It" /><published>2025-06-01T00:00:00-04:00</published><updated>2025-06-01T09:00:00-04:00</updated><id>https://www.ozkary.dev/restore-vscode-after-windows-update-remove-it</id><content type="html" xml:base="https://www.ozkary.dev/restore-vscode-after-windows-update-remove-it/"><![CDATA[<h1 id="overview">Overview</h1>

<p>Windows updates are meant to improve system stability, but occasionally, they <strong>restructure important folders</strong>, leading to unexpected issues. One problem some users have encountered is <strong>VS Code files being moved</strong> to a mysterious <code class="language-plaintext highlighter-rouge">_</code> folder inside its installation directory. If this happens to you, don’t worry <strong>you can restore VS Code easily</strong> with a simple script!</p>

<p><img src="../../assets/2025/ozkary-restore-vscode-files-after-windows-update.jpg" alt="Restore VSCode files after windows update remove it" /></p>

<h2 id="understanding-the-issue">Understanding the Issue</h2>

<p>After certain Windows updates, your <strong>VS Code installation folder</strong> (<code class="language-plaintext highlighter-rouge">C:\Users\{YourUsername}\AppData\Local\Programs\Microsoft VS Code</code>) may contain a subfolder called <code class="language-plaintext highlighter-rouge">_</code>. Instead of properly maintaining the installation structure, the update <strong>isolates essential VS Code files</strong> within this <code class="language-plaintext highlighter-rouge">_</code> folder, making it difficult for the application to launch correctly.</p>

<h2 id="how-to-fix-it-manually">How to Fix It Manually</h2>
<ol>
  <li>Open <strong>File Explorer</strong> and navigate to:</li>
</ol>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>C:<span class="se">\U</span>sers<span class="se">\{</span>YourUsername<span class="o">}</span><span class="se">\A</span>ppData<span class="se">\L</span>ocal<span class="se">\P</span>rograms<span class="se">\M</span>icrosoft VS Code
</code></pre></div></div>

<ol>
  <li>If you see a <code class="language-plaintext highlighter-rouge">_</code> folder, open it.</li>
  <li><strong>Move all its contents</strong> back to the parent directory.</li>
  <li>Restart <strong>VS Code</strong> to ensure everything works normally.</li>
</ol>

<h2 id="automate-the-fix-with-a-script">Automate the Fix with a Script</h2>
<p>If you want a <strong>one-click solution</strong>, this batch script will <strong>detect the misplaced files</strong>, prompt you for confirmation, and move them back automatically:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@echo off
setlocal

:: <span class="o">==============================================================</span>
:: Restore VS Code After Windows Updates Remove It
:: <span class="o">==============================================================</span>
:: Some Windows updates mistakenly move VS Code files into a <span class="s2">"_"</span> 
:: subfolder inside its main installation directory. This script 
:: checks <span class="k">if </span>the folder exists and prompts the user before restoring 
:: the files to the correct location.
:: <span class="o">==============================================================</span>

:: Define the VS Code installation directory
<span class="nb">set</span> <span class="s2">"vscodeDir=%USERPROFILE%</span><span class="se">\A</span><span class="s2">ppData</span><span class="se">\L</span><span class="s2">ocal</span><span class="se">\P</span><span class="s2">rograms</span><span class="se">\M</span><span class="s2">icrosoft VS Code"</span>

:: Define the misplaced folder path
<span class="nb">set</span> <span class="s2">"underscoreDir=%vscodeDir%</span><span class="se">\_</span><span class="s2">"</span>

:: Check <span class="k">if </span>the <span class="s2">"_"</span> directory exists
<span class="k">if </span>not exist <span class="s2">"%underscoreDir%"</span> <span class="o">(</span>
 <span class="nb">echo </span>No misplaced files found. Nothing to fix!
 <span class="nb">exit</span> /b
<span class="o">)</span>

:: Prompt user <span class="k">for </span>confirmation
<span class="nb">echo </span>A misplaced folder <span class="o">(</span><span class="s2">"_"</span><span class="o">)</span> was found inside the VS Code installation directory.
<span class="nb">set</span> /p <span class="nv">userInput</span><span class="o">=</span>Do you want to move its contents back to the parent folder? <span class="o">(</span>Y/N<span class="o">)</span>: 

:: Convert input to uppercase to handle lowercase entries
<span class="k">if</span> /I not <span class="s2">"%userInput%"</span><span class="o">==</span><span class="s2">"Y"</span> <span class="o">(</span>
 <span class="nb">echo </span>Operation canceled.
 <span class="nb">exit</span> /b
<span class="o">)</span>

:: Move files back to the parent directory
<span class="nb">echo </span>Moving files back to parent directory...
move <span class="s2">"%underscoreDir%</span><span class="se">\*</span><span class="s2">"</span> <span class="s2">"%vscodeDir%"</span>
<span class="nb">echo </span>Done! The misplaced files have been restored.

endlocal

</code></pre></div></div>

<h2 id="how-to-use-the-script">How to Use the Script</h2>
<ul>
  <li>Copy the code into Notepad.</li>
  <li>Save it as restore_vscode.bat (make sure it’s saved as All Files, not a .txt file).</li>
  <li>Run the script by right-clicking and selecting Run as administrator.</li>
  <li>If the _ folder exists, the script will ask for confirmation before moving the files.</li>
  <li>Press Y and hit Enter to restore your VS Code files.</li>
</ul>

<h2 id="automating-the-process-for-future-updates">Automating the Process for Future Updates</h2>

<p>If you find this problem recurring after every update, consider automating the fix:</p>

<ul>
  <li>Task Scheduler: Set up a scheduled task to run this script after each Windows update.</li>
  <li>Startup Folder: Place the script in the Windows startup directory so it runs on boot.</li>
</ul>

<p>By using this script, you’ll save time and frustration, ensuring VS Code remains fully functional after every Windows update.</p>

<p>Thanks for reading and follow me for more technical articles, videos and podcasts</p>

<ul>
  <li><a href="https://github.com/ozkary">GitHub</a></li>
  <li><a href="https://x.com/ozkary">Twitter</a></li>
  <li><a href="https://www.youtube.com/@ozkary">YouTube</a></li>
  <li><a href="https://bsky.app/profile/ozkary.bsky.social">BlueSky</a></li>
</ul>

<p>👍 Originally published by <a href="https://www.ozkary.com">ozkary.com</a></p>]]></content><author><name>Oscar D. Garcia - Ozkary</name></author><category term="windows" /><category term="github" /><category term="vscode" /><summary type="html"><![CDATA[After a Windows update, is VS Code removed from your computer? If this happens to you, don’t worry the files are not gone, you can restore VS Code easily with a simple script!]]></summary></entry></feed>