Vibe coding is diving into AI-driven development with a hazy idea and no plan. While it feels productive at first, it often leads to a shaky foundation that makes future development messy and frustrating.Without a clear plan, the AI can seem forgetful, producing inconsistent code and logic. This leads to scope creep, bugs, and an unfinished product that misses the original goal.
@personofswag/X
Instead of building on a vibe, build from a plan. This is the purpose of the Multi-AI Iterative Development (MAID) framework, a systematic method for creating robust, scalable applications with AI. The framework is built on four distinct phases:
This principle of using the right tool for the job mirrors how professionals already work.The 2025 Stack Overflow Developer Survey shows that while general-purpose models like ChatGPT are a common starting point, developers strategically use other models for specific tasks. They turn to tools like Gemini for complex reasoning and planning, and models like Claude Sonnet for code implementation.
The first phase is a structured dialogue. This isn’t a casual chat; it’s a focused conversation that prevents “vibe coding” by turning a high-level vision into a concrete plan. Use a conversational AI like ChatGPT as a partner to move from the abstract to the concrete.
This process adapts to you. A non-technical founder might spend more time on the Why, while a senior engineer can focus on the How. The goal is always the same: clarity.
The conversation follows a simple funnel: from the high-level Why, to the functional What, and finally to the technical How.
Dialogue
Prompt
1
The Why (High-Level Vision)
Focus entirely on the human element. This stage defines the user’s pain point and the ultimate goal, using simple, non-technical language.Key Questions to Explore:
Who is this for?
Is this an internal tool or a public-facing product?
What specific problem does this solve for them?
What does success look like from their perspective?
2
The What (Functional Requirements)
Translate the vision into tangible features and behaviors. You’re defining what the system must do without deciding on the technology yet.Key Questions to Explore:
What are the core features needed for an MVP?
What kind of data will the system handle?
Are there critical needs like speed, security, or scale?
What are the aesthetic or design guidelines (e.g., “minimal, functional,” “warmer tones”)?
3
The How (Technical Exploration)
Once the “what” is clear, explore the “how.” A non-technical person can ask the AI for suggestions, while a technical person can validate their own ideas.Key Questions to Explore:
Based on our needs, what technologies or stacks are a good fit?
What are the trade-offs between different frameworks?
Let’s see this dialogue in action for a user building an internal web scraping tool.
The dialogue begins with a vague idea...
User: “I need a way to get data from websites.”
The AI probes for the "Why"
AI: “Okay, let’s explore that. What’s the main goal? What will you
do with this data once you have it?”
The user reveals the core motivation
User: “I want to feed it into another AI to train a custom model for
my business. The data needs to be high quality.”
The AI defines the "What"
AI: “Got it. That means accuracy and structure are key. What kind of
websites are we talking about? Simple articles, or complex pages where
you have to click to see information?”
The user provides critical details
User: “Both, actually. Some are simple blogs, but others are product
pages where prices and reviews appear dynamically.”
The AI suggests the "How"
AI: “Perfect. So we need an internal tool that can scrape both
static and dynamic sites and output clean, structured data. Based on
that, we should explore technologies that can control a browser, like
Playwright, combined with a library like Pydantic to enforce
data structure. Does this sound like the right direction?”
The user asks a final, informed question
User: “Yes, that’s exactly it. What are the pros and cons of those
tools you mentioned compared to others?”
The output of this phase is a dialogue transcript containing clear answers to your project’s “Why,” “What,” and “How.” This transcript is the raw material for the next phase.
In this phase, you’ll formalize the discoveries from your dialogue into a machine-readable specification that serves as the AI’s single source of truth. This detailed plan prevents AI amnesia and scope creep.
1
Index Key Documentation
First, gather comprehensive documentation for your chosen libraries and frameworks.
Deep Control
Quick Start
For fine-grained control, use gpt-crawler to create a docs folder containing custom knowledge files from documentation websites.
Set up the crawler:git clone the repo and run npm i.
Configure config.ts: Define the target url, a match pattern, and a CSS selector for the main content.
Run the crawler:npm start to generate a JSON file in the docs folder.
Next, translate your dialogue transcript into a detailed Product Requirements Document (PRD). Use a long-context AI like Gemini or a local model as a partner to help structure your decisions into a formal PRD.md file.
The PRD is a living document. It serves as the definitive guide for what to build at any given time.
A strong PRD codifies the answers from your research into three core sections:
The Vision & User Experience (The 'What')
This section formalizes the functional requirements from your dialogue.
Defines the end-user and the problem being solved.
Outlines the ideal user journey and core features.
Specifies aesthetic and design guidelines.
The Technical Plan (The 'How')
This outlines the technical decisions made during your research.
Details the system architecture and project structure.
Specifies the key data models and schemas (e.g., using Pydantic).
Establishes non-negotiable architectural principles like TDD and AI-consumable logging.
The Definition of Done
This section defines how you’ll measure success.
Lists the specific unit and end-to-end tests that must pass.
Defines key metrics for success (e.g., speed, accuracy, reliability).
Sets the final criteria for project completion.
Show Example PRD
Copy
# **Product Requirements Document: Internal AI Training Data Scraper**## **1. Vision and Strategy**### **1.1. Project Objective**To build a reliable, configuration-driven internal tool that automates the scraping of web data from both static and dynamic websites. The system's primary goal is to produce high-quality, structured, and validated data suitable for training AI models.### **1.2. Problem Statement & User Persona**- **Problem:** Gathering high-quality training data is a manual, slow, and error-prone process.- **Primary User Persona: "The ML Engineer"** - **Needs:** A dependable, automated way to source clean, schema-compliant data. - **Pain Points:** Models failing due to inconsistent input data; wasting time on data cleaning.### **1.3. Mandatory Principles**- **Schema-First with Pydantic:** All data output **must** be validated through a Pydantic model.- **Test-Driven Development (TDD):** All core logic must have corresponding unit tests.- **Configuration-Driven:** The scraping process is defined in external YAML files. No hardcoding.- **Asynchronous Core:** The application **must** be built on Python's `asyncio`.- **AI-Consumable Logging:** All logs must be structured (JSON) and support token-batching for AI analysis.---## **2. System Architecture**### **2.1. Project Directory Structure**- `src/`: Main application source code (`models/`, `scrapers/`, `services/`).- `tests/`: All unit and integration tests.- `config/`: User-facing configuration files (`targets.yaml`).- `output/`: Destination for scraped data (`data/`) and error logs (`errors/`).- `logs/`: Application logs, structured for AI consumption.- `run.py`: Command-line interface (CLI) entry point.### **2.2. Master Orchestrator Lifecycle (`main.py`)**The orchestrator is the central script that manages the entire process.1. **Initialization:** - Loads the scraping configurations from `config/targets.yaml`. - Initializes core services (e.g., `LoggingService`, `StorageService`).2. **Concurrent Execution:** - Creates an `asyncio` event loop. - For each target defined in the configuration, it creates and schedules a `ScraperWorker` task. - Uses `asyncio.gather` to run all scraping tasks concurrently, with a configurable concurrency limit.3. **Graceful Shutdown:** Ensures all pending tasks are completed and files are closed properly upon termination.---## **3. Core Functional Requirements**### **3.1. User Configuration (`config/targets.yaml`)**The system's behavior is entirely controlled by this file.yaml# config/targets.yaml concurrency_limit: 10 # Max number of scrapers to run at once. targets: - name: 'TechBlog' url: 'https://example-tech-blog.com/posts/latest' model_name: 'BlogPost' # Corresponds to a Pydantic model in src/models/ output_file: 'techblog_posts.jsonl' - name: 'EcommerceProductPage' url: 'https://example-store.com/products/widget-pro' model_name: 'ProductPage' output_file: 'ecommerce_products.jsonl'### **3.2. Pydantic Data Models (`src/models/`)**This directory defines the required data schemas.python# src/models/blog.pyfrom pydantic import BaseModel, HttpUrlfrom datetime import datetimeclass BlogPost(BaseModel): title: str author: str publication_date: datetime url: HttpUrl content_length: int# src/models/ecommerce.pyfrom pydantic import BaseModel, HttpUrl, Fieldclass ProductPage(BaseModel): product_name: str sku: str price: float = Field(gt=0) url: HttpUrl in_stock: bool### **3.3. Foundational Services**- **`ConfigService`:** Loads and validates `targets.yaml`.- **`StorageService`:** Provides thread-safe methods to append validated Pydantic models (as JSON) to the correct output file in `output/data/`. Also handles saving raw HTML on validation failure to `output/errors/`.- **`LoggingService`:** Configures structured JSON logging for the entire application, with configurable output for AI consumption.### **3.4. Scraper Worker Workflow**A single, generic `ScraperWorker` will be responsible for executing a scrape task.1. **Initialization:** Takes a single target configuration object (e.g., for 'TechBlog').2. **Browser Automation:** Launches a Playwright instance, navigates to the target `url`.3. **Data Extraction:** Uses a combination of CSS selectors and logic to extract the raw data fields required by its assigned `model_name`.4. **Validation & Storage:** - Attempts to instantiate the Pydantic model (e.g., `BlogPost(**raw_data)`). - **On Success:** Passes the validated model instance to the `StorageService`. - **On `ValidationError`:** Catches the exception, logs the detailed validation error, and instructs the `StorageService` to save the page's raw HTML for debugging.---## **4. User Interface and End-to-End Validation**### **4.1. Command-Line Interface (CLI)**The primary user interface will be a simple but powerful CLI using a library like `Typer` or `argparse`.- `python run.py all`: Runs the scraper for all targets defined in `targets.yaml`.- `python run.py single --name TechBlog`: Runs the scraper for only a single, named target.- `python run.py list-targets`: Prints a list of all configured targets.### **4.2. End-to-End (E2E) Test Suite**E2E tests will use `Pytest` and `pytest-playwright` to validate complete workflows against local mock HTML files.- **Key Scenarios to Test:** 1. **"Happy Path":** `run.py single` successfully scrapes a mock HTML file, validates the data against a Pydantic model, and writes a single line to the correct JSONL file. 2. **Validation Failure:** The scraper attempts to scrape a malformed mock HTML file, a `ValidationError` is raised, a detailed error is logged, and the raw HTML is saved to the `output/errors` directory. 3. **Concurrency Test:** `run.py all` with multiple mock targets runs concurrently and produces the correct output for all successful targets. 4. **404 Not Found:** A target URL points to a non-existent page; the system logs the error gracefully and moves on.---## **5. Error Handling and Resiliency**### **5.1. Tiered Failure Response Policy**| Error Scenario | Agent/Service | Automated Response || :------------------------------- | :----------------- | :------------------------------------------------------------------------------------------------------------- || **URL Not Found (404)** | `ScraperWorker` | Log `ERROR` with URL. Update status. Move to next target. || **Playwright Timeout** | `ScraperWorker` | Log `WARNING`. Retry navigation once. If it fails again, log `ERROR` and move on. || **Pydantic `ValidationError`** | `ScraperWorker` | **Critical Path:** Log `ERROR` with the *full validation error message*. Save raw page HTML to `output/errors/`. || **Missing Required Element** | `ScraperWorker` | Log `ERROR` (e.g., "Selector for 'title' not found"). Save raw page HTML. |---## **6. Non-Functional Requirements (NFRs)**- **Performance:** The system should support the `concurrency_limit` defined in the config without significant performance degradation.- **Reliability:** Designed to be run unattended (e.g., as a nightly cron job).- **6.3. Observability and Logging:** - All log output **must** be in a structured JSON format to facilitate machine parsing. - The `LoggingService` must provide two user-configurable output modes for optimal AI consumption: 1. **`single_file`**: All logs from a run are appended to a single `run.log` file. 2. **`token_batched`**: The logging stream is split into multiple files (`batch_1.log`, `batch_2.log`, etc.) in the `logs/` directory. Each file must be kept under a configurable token limit (e.g., 8,000 tokens) to ensure it can be easily fed into an AI context window for debugging.---## **7. Out of Scope**- A graphical user interface (GUI). The CLI is sufficient.- Storing scraped data in a relational database. JSONL files are the required output format.- Automated CAPTCHA or Cloudflare solving.- Distributed scraping across multiple machines.---## **8. Success Criteria**- **TDD Compliance:** Core logic (`services`, `scrapers`) achieves >90% unit test coverage.- **Configuration-Driven:** The tool runs successfully, driven entirely by the `targets.yaml` file, without any code changes.- **Data Integrity:** 100% of the data in the `output/data` directory successfully validates against its corresponding Pydantic schema.- **Error Reporting:** For any failed scrape, a corresponding error log and raw HTML file are generated, allowing for immediate diagnosis.- **CLI Functionality:** All defined CLI commands (`all`, `single`, `list-targets`) work as specified.
Show Prompt for PRD Creation
Copy
**Your Role:** You are an **AI Product Architect**. Your function is to help users create a comprehensive, production-grade Product Requirements Document (PRD) by structuring the decisions they've already made.**Your Core Objective:** To work with the user to translate their existing research and ideas into a detailed, well-structured, and actionable PRD based on a proven template.**Your Guiding Blueprint:** The final document will be based on these core sections. You will help the user populate them one by one.1. **Vision and Strategy** (The 'Why' and 'What')2. **System Architecture** (The 'How')3. **Core Functional Requirements**4. **User Interface and End-to-End Validation**5. **Error Handling and Resiliency**6. **Non-Functional Requirements (NFRs)**7. **Out of Scope**8. **Success Criteria**---### Your Conversational WorkflowYou will guide the user through the PRD creation process section by section. Do not ask them to reinvent the wheel; instead, prompt them to formalize the decisions from their initial research.**Phase 1: The Foundation (Vision & Strategy)**- **Your Task:** Start with the big picture.- **Example Probes:** - "Let's start with Section 1: Vision and Strategy. Based on your research, what is the main objective of your project in one or two sentences?" - "Now, let's formalize the user persona. Who is the primary user, and what problem are we solving for them?" - "Let's define some **Mandatory Principles**. For example, your research pointed to using TDD and Pydantic. Let's write those down as non-negotiable rules."**Phase 2: The Blueprint (System Architecture)**- **Your Task:** Define the project's structure based on the chosen tech stack.- **Example Probes:** - "Now for Section 2: System Architecture. Based on your decision to use Python and Playwright, a standard structure would be `src/`, `tests/`, and `config/`. Does that work for you?" - "Let's describe the main orchestrator's lifecycle from startup to shutdown."**Phase 3: The Engine (Core Functional Requirements)**- **Your Task:** This is the most detailed section. Break it down into smaller pieces: Configuration, Services, and Workflows.- **Example Probes:** - "Let's move to Section 3: Functional Requirements. How will a user configure this system? We should design the YAML configuration files now. For example, what would a `config/filters.yaml` or `config/documents.yaml` look like for your project?" - "Now let's define the core services. What are the key stateless components? For example, a `DatabaseService` or a `QueryBuilderService`. What is the exact responsibility of each?" - "What are the primary workflows or 'agents' in your system? Let's describe what triggers them and what their step-by-step process is."**Phase 4: The Experience (UI & Testing)**- **Your Task:** Define how the user interacts with the system and how you'll validate it.- **Example Probes:** - "For Section 4, let's talk about the User Interface. If there is one, what are the key screens or components? For instance, a dashboard, a configuration editor, or a log viewer?" - "How will we know the entire system works end-to-end? Let's list 3-5 critical E2E test scenarios, like the 'happy path' or a specific failure case."**Phase 5: The Safety Net (Error Handling & Resiliency)**- **Your Task:** Plan for when things go wrong.- **Example Probes:** - "Great projects plan for failure. In Section 5, let's define how to handle errors. Is there a need for a Human-in-the-Loop (HITL) strategy for problems the AI can't solve, like a CAPTCHA?" - "Let's create a tiered failure policy. For a specific error, like 'API rate limit reached,' what is the automated response? What is the manual response?" (Suggest creating a table like the example).**Phase 6 and Beyond: Finalizing the Document**- **Your Task:** Guide the user through the remaining sections (NFRs, Out of Scope, Success Criteria) to complete the PRD.- **Example Probes:** - "We're almost done! For Section 6: Non-Functional Requirements, what are the key performance or reliability targets? (e.g., 'average response time < 500ms'). A critical NFR is also the logging strategy for AI debugging. Do you need a single log file or token-batched logs?" - "To prevent scope creep, let's define what's explicitly **Out of Scope** for this project in Section 7." - "Finally, in Section 8, how will we measure success? Let's list the criteria that must be met for the project to be considered complete. This should tie back to our requirements."---### Guiding Principles for the AI- **Maintain a Living Document:** After each phase, present the updated PRD draft within a markdown block for the user's review.- **Be a Structured Partner:** You are not just a scribe; you are an architect. If a user's request is vague, ask clarifying questions to fit it into the structured format.- **Leverage the Example's Strength:** Proactively suggest structures from the example, such as using YAML for configs, creating tables for error handling, and defining a clear database schema.- **Always Tie Back to the "Why":** Continuously connect detailed requirements back to the user's initial problem statement to ensure the project stays focused.
3
Designing for AI Observability
A critical architectural decision is how your application generates logs. In the MAID framework, logs are first-class context for AI. Mandate in your PRD that a centralized LoggingService must produce structured (JSON) logs. This service should support at least two AI-friendly modes:
single_file: All logs from an execution are streamed into one file for a complete overview.
token_batched: Logs are split into smaller files, each under a specific token limit. This is powerful for feeding a focused log batch into an AI for debugging without exceeding its context window.
4
Establish Project Rules
Finally, create a file (CLAUDE.md or PROJECT_RULES.md) that outlines the non-negotiable coding standards. This file prevents the AI from deviating from your plan.
This is a living document that should be updated as new architectural decisions are made.
Here’s a simple example for our web scraping project:
Copy
# Project: Internal Web Scraper## 1. Core Technologies & Architecture- **Tech Stack**: Python 3.11+, Playwright, Pydantic, Pytest- **Package Manager**: UV. Managed via `pyproject.toml`.- **Architecture**: Asynchronous, configuration-driven CLI tool.## 2. Non-Negotiable Rules- **Testing**: All new logic must have unit tests.- **Logging**: All log output must be structured JSON, supporting AI-consumable modes (`single_file`, `token_batched`) as defined in the PRD.- **Data Validation**: All data must be validated by a Pydantic model. No exceptions.- **Git**: Commits must follow the Conventional Commits specification.
Once complete, your Context Library should contain:
With a detailed plan, you can now move from planning to implementation. Instead of writing piecemeal prompts, you can provide the entire PRD.md and have the AI generate the application skeleton in a single pass.Provide the context library to a specialized coding AI (like Claude Code) whose sole job is to translate your plan into clean, validated code.
A key feature of Claude Code is its ability to automatically detect and apply rules from a CLAUDE.md file in your project’s root. This means your prompts can be more direct, focusing on the what (from the PRD) while trusting the AI to handle the how (from the rules file) without being explicitly told.
Depending on your project’s scale, choose one of two primary strategies:
Holistic (Most Projects)
Modular (Larger Projects)
This is the fastest way to get started. Provide the AI with the entire Context Library (PRD.md, CLAUDE.md, and all docs/* files) and instruct it to generate the complete initial codebase. This approach is highly effective for new projects or applications of small-to-medium complexity.
A key strength of this framework is its test-driven feedback loop. By instructing the AI to write tests from the beginning, you create a system where the AI can build, verify, and self-correct. This transforms your role from a line-by-line coder to a strategic reviewer.The cycle is simple but effective:This workflow is a systematic collaboration between you and the AI:
AI Generates and Self-Corrects: Based on the PRD and your prompt, the AI generates the feature code and corresponding tests. It then enters an autonomous loop, running its own tests and fixing simple bugs (like syntax errors or missing imports) until its own quality checks pass.
You Review the Implementation: Once the AI’s automated pass is complete, you step in. Review the code not just for correctness, but against the project’s true requirements: Does it solve the problem defined in the PRD? Is the user experience right? Are there edge cases the AI missed?
Approve or Iterate:
If the feature is approved, it’s complete.
If it needs changes, provide specific, corrective feedback. This feedback becomes a new, more refined prompt that kicks off the cycle again, ensuring the AI’s next attempt is better than the last.
This loop automates the tedious parts of development, allowing the AI to handle initial drafting and bug-fixing while you focus on high-level architecture and final validation.
User: “Using the attached PRD.md as your guide, generate the entire initial project structure and the complete code for all foundational services: ConfigService, StorageService, and LoggingService. Also, generate the corresponding pytest unit tests for each service.”
The AI confirms its context and executes the plan
AI: “Understood. I have loaded your PRD.md and will apply the
architectural rules and standards found in your CLAUDE.md file. I will now
generate the initial project structure, all specified foundational
services, and their corresponding unit tests. Here is the output:”
The AI generates the code exactly as specified
Example AI-Generated Output:“Here is the directory structure based on Section 2.1 of the PRD:”
“And here is the code for src/services/storage_service.py, adhering to the PRD’s requirements and the coding standards from CLAUDE.md (type hints, docstrings, structured logging):“
src/services/storage_service.py
Copy
import asyncioimport jsonimport loggingfrom pathlib import Pathfrom typing import Dict, Anyimport aiofilesfrom pydantic import BaseModel# Get the logger configured by LoggingServicelogger = logging.getLogger(__name__)class StorageService: """ Provides async, thread-safe methods for storing validated data and error logs. """ def __init__(self, output_dir: Path = Path("output")): self.output_dir = output_dir self.data_dir = self.output_dir / "data" self.error_dir = self.output_dir / "errors" self._locks: Dict[str, asyncio.Lock] = {} self._create_directories() def _create_directories(self) -> None: """Creates the necessary output directories if they don't exist.""" self.data_dir.mkdir(parents=True, exist_ok=True) self.error_dir.mkdir(parents=True, exist_ok=True) async def _get_lock(self, filename: str) -> asyncio.Lock: """Lazily creates and returns a lock for a given filename.""" if filename not in self._locks: self._locks[filename] = asyncio.Lock() return self._locks[filename] async def save_validated_data(self, model_instance: BaseModel, output_file: str) -> None: """ Appends a validated Pydantic model as a JSON line to the specified file. """ path = self.data_dir / output_file lock = await self._get_lock(output_file) async with lock: try: async with aiofiles.open(path, mode='a', encoding='utf-8') as f: await f.write(model_instance.model_dump_json() + '\n') logger.info( "Successfully saved data to %s", output_file, extra={"url": str(getattr(model_instance, 'url', 'N/A'))} ) except Exception: logger.error("Failed to write to file %s", output_file, exc_info=True) async def save_error_html(self, html_content: str, failed_url: str) -> None: """ Saves the raw HTML of a page that failed validation for debugging. """ sanitized_name = "".join(c if c.isalnum() else "_" for c in failed_url) filename = f"error_{sanitized_name}.html" path = self.error_dir / filename try: async with aiofiles.open(path, mode='w', encoding='utf-8') as f: await f.write(html_content) logger.warning("Saved raw HTML for failed validation to %s", filename) except Exception: logger.error("Failed to save error HTML for %s", failed_url, exc_info=True)
The final phase treats debugging as a systematic process, not guesswork. Instead of asking an AI, “Why is my code broken?”, you provide a curated package of information to help it diagnose and fix the root cause.Your role in this phase is to curate the right context. The AI doesn’t need to guess what the code should do; it can compare the buggy behavior against the project’s official source of truth—the PRD.md.
1
Isolate the Problem
First, reproduce the bug by running the relevant test or application workflow. Pinpoint the symptom: is it a test failure, a validation error, a browser crash, or something else?
2
Gather Precise Context
Once you know the symptom, gather the three key pieces of context the AI needs for an accurate diagnosis:
Evidence: The exact error log or test failure output. Provide the full error message and traceback.
Location: The specific code file(s) where the error is occurring. Focus the AI’s attention instead of providing the entire codebase.
Source of Truth: The expected behavior. Reference the specific section of the PRD.md or CLAUDE.md that defines how the system should have behaved.
3
Delegate the Fix to the AI
Provide the curated context to an AI with a direct command, not a vague question. The quality of the AI’s proposed solution is directly proportional to the quality of your prompt.Your prompt must contain three elements:
Goal: A clear statement (e.g., “Find the root cause and provide the corrected code.”).
Context: The evidence and location you just assembled.
Source of Truth: A direct reference to your project plan (PRD.md or PROJECT_RULES.md). This forces the AI to solve the problem based on your rules, not its own assumptions.
Let’s see this process in action. Our web scraper runs, but no data is being saved for the ‘TechBlog’ target.
1. The user isolates the problem and gathers context
User: “The scraper completed, but techblog_posts.jsonl is empty. The logs show a recurring ValidationError.The Evidence (from logs/app.log):
Copy
{ "level": "ERROR", "message": "Pydantic validation failed for TechBlog", "error": "1 validation error for BlogPost\npublication_date\n Input should be a valid datetime..."}
The Location:
src/models/blog.py (where BlogPost is defined)
src/scrapers/blog_scraper.py (the code that extracts the date)
The Source of Truth:
Section 3.2 of our PRD.md states the publication_date field must be a datetime object.
The Goal: Please review the scraper logic and the model to find the mismatch and provide the corrected code.”
2. The AI cross-references the context to find the root cause
AI: “Understood. I have analyzed the provided context.
Diagnosis: The error log confirms the publication_date field is receiving a string that cannot be parsed directly into a datetime object. The blog_scraper.py is likely extracting a human-readable date format (e.g., “August 14, 2025”) that Pydantic doesn’t recognize by default.
Solution: I will add a custom validator to the BlogPost model in src/models/blog.py to parse this specific date format before validation occurs. This centralizes the validation logic within the model, adhering to our schema-first principle.”
3. The AI provides the corrected code
AI: “Here is the updated src/models/blog.py file with the fix:”
src/models/blog.py
Copy
from pydantic import BaseModel, HttpUrl, field_validatorfrom datetime import datetimeclass BlogPost(BaseModel): title: str author: str publication_date: datetime url: HttpUrl content_length: int @field_validator('publication_date', mode='before') @classmethod def parse_publication_date(cls, value: str) -> datetime: """ Parses a string like 'August 14, 2025' into a datetime object. """ if isinstance(value, str): try: # Attempt to parse the expected string format return datetime.strptime(value, '%B %d, %Y') except ValueError: raise ValueError(f"Invalid date format: {value}") return value
Explanation: The @field_validator decorator intercepts the incoming publication_date string. It uses datetime.strptime to parse the specific format into a valid datetime object before Pydantic performs its own validation. This resolves the error while keeping the data model robust.”