Agentic AI Interview Guide
Table of Contents
- Roadmaps & Learning Path
- Agentic AI Fundamentals
- Agent Architectures
- Tool Use & Function Calling
- Planning & Reasoning
- Memory & State Management
- AI Agent Prerequisites
- Prompt Engineering for Agents
- Building Agents & Frameworks
- Multi-Agent Systems
- Agent Evaluation & Testing
- Observability, Security & Ethics
- Real-World Applications
- Challenges & Solutions
- Interview Questions & Answers
Roadmaps & Learning Path
AI Agents roadmap and related roadmap.sh tracks
If you are preparing for AI agent interviews, use the AI Agents roadmap as the main track and then study the adjacent tracks that support production-grade agent work.
Primary roadmap:
- AI Agents interactive roadmap: https://roadmap.sh/ai-agents
Most relevant adjacent tracks:
- AI Engineer roadmap: https://roadmap.sh/ai-engineer
- AI and Data Scientist roadmap: https://roadmap.sh/ai-and-data-scientist
- MLOps roadmap: https://roadmap.sh/mlops
- AI Red Teaming roadmap: https://roadmap.sh/ai-red-teaming
- Prompt Engineering roadmap: https://roadmap.sh/prompt-engineering
- Backend roadmap: https://roadmap.sh/backend
- Git and GitHub roadmap: https://roadmap.sh/git-github
- API Design roadmap: https://roadmap.sh/api-design
- Shell / Bash roadmap: https://roadmap.sh/shell-bash
Detailed roadmap PDF links surfaced from roadmap.sh:
- AI Engineer PDF: https://roadmap.sh/pdfs/roadmaps/ai-engineer.pdf
- MLOps PDF: https://roadmap.sh/pdfs/roadmaps/mlops.pdf
- Prompt Engineering PDF: https://roadmap.sh/pdfs/roadmaps/prompt-engineering.pdf
- Backend PDF: https://roadmap.sh/pdfs/roadmaps/backend.pdf
- Git and GitHub PDF: https://roadmap.sh/pdfs/roadmaps/git-github.pdf
- API Design PDF: https://roadmap.sh/pdfs/roadmaps/api-design.pdf
Note: roadmap.sh clearly surfaced the interactive AI Agents roadmap, but an AI Agents PDF link was not surfaced in the retrieved results. Use the interactive roadmap page as the source of truth for that track.
Recommended study order
- Git and GitHub
- Shell / Bash
- Backend and API Design
- Prompt Engineering
- AI Engineer
- AI Agents
- MLOps
- AI Red Teaming
Why these prerequisite tracks matter
- Basic backend development: agents usually call APIs, manage sessions, handle retries, work with queues, and persist state.
- Git and terminal usage: agent engineers constantly inspect repos, run commands, test code, and automate workflows.
- REST API knowledge: most tool invocation is just structured API usage with validation, retries, and auth.
- Prompt Engineering: bad prompts create bad plans, tool misuse, and unstable outputs.
- MLOps: production agents need evaluation, deployment, monitoring, rollback, and cost control.
Agentic AI Fundamentals
What is Agentic AI?
Definition: Agentic AI refers to autonomous systems that use AI (particularly LLMs) as the decision-making engine to accomplish goals through iterative planning, action execution, and reasoning. Unlike traditional chatbots that respond to queries, agents actively pursue objectives.
Key Characteristics:
- Autonomy: Acts without constant human direction
- Goal-Oriented: Works toward specific objectives
- Iterative: Plans, acts, observes, reasons, repeats
- Tool-Enabled: Uses external functions/APIs
- Reasoning: Explains decisions and adapts to feedback
- State-Aware: Maintains context and history
Chat vs Agent:
Chat:
User → Query
Bot → Response
Done
Agent:
Goal → Plan → Act → Observe → Reason → Decide → Repeat until Goal Achieved
Core Loop (ReAct Framework):
┌─────────────────────────────────────┐
│ User provides goal │
└────────────┬────────────────────────┘
│
↓
┌─────────────────────────────────────┐
│ Thought: Analyze situation │
│ Decide what to do next │
└────────────┬────────────────────────┘
│
↓
┌─────────────────────────────────────┐
│ Action: Call tool/function │
│ (Web search, calc, API, etc) │
└────────────┬────────────────────────┘
│
↓
┌─────────────────────────────────────┐
│ Observation: Get result │
│ Add to context │
└────────────┬────────────────────────┘
│
├─ Goal achieved? → Return answer
└─ Need more info? → Loop to Thought
Example Agent Session:
User: "What's the current weather in Paris and how does it compare to historical averages?"
Agent:
Thought: I need current weather and historical data
Action: search_weather("Paris current weather")
Observation: Temperature 8°C, Clear skies
Thought: Now I need historical averages
Action: search_weather("Paris historical average January")
Observation: Average January temp 3.8°C
Thought: I have all needed info, can answer
Final Answer: Paris is currently 8°C (4.2°C above historical average for January)
Components of an Agent:
- LLM (Brain): Thinks, decides, reasons
- Tools: Functions to accomplish tasks
- Memory: Stores context and history
- Planner: Breaks goals into steps
- Environment: External world
- Executor: Runs actions
Why Agentic AI Matters
Advantages over Traditional Systems:
| Aspect | Traditional Software | Agentic AI |
|---|---|---|
| Flexibility | Hard-coded logic | Adaptive reasoning |
| New Tasks | Requires reprogramming | Works with new tasks naturally |
| Complexity | Complex workflows hard | Handles multi-step complexity |
| User Interaction | Limited dialogue | Natural conversation + action |
| Error Handling | Pre-defined responses | Adapts and retries |
| Learning | Static | Can improve from feedback |
Real-World Impact:
- Faster knowledge worker tasks
- Reduced manual work (50-80% time savings)
- Better decision-making
- 24/7 availability
- Consistent process execution
Agent Architectures
Q1: What are different agent architectures?
Answer:
1. ReAct (Reasoning + Acting)
Most popular architecture, interleaves thinking with action
Thought: [Analyze current state]
Action: [Choose tool and arguments]
Observation: [Tool result]
Thought: [Process observation]
Action: [Next action or final answer]
Benefits:
- Interpretable reasoning
- Flexible action selection
- Works with any LLM
Example:
User: "How much is 15% of $200?"
Thought: Need to calculate percentage
Action: calculator(200 * 0.15)
Observation: 30
Final Answer: 15% of $200 is $30
2. Chain-of-Thought (CoT)
Breaks down complex problems into steps
Problem → Step 1 → Step 2 → Step 3 → Answer
Better for reasoning-heavy tasks
Example:
Problem: "If a train travels 60mph for 3 hours, how far does it go?"
Step 1: Distance = Speed × Time
Step 2: Distance = 60 × 3
Step 3: Distance = 180
Answer: 180 miles
3. Tool Use / Function Calling
Strictly invoke predefined tools
LLM decides:
- Which tool to use
- What arguments to provide
- When to stop
tools = [
{"name": "search", "description": "..."},
{"name": "calculator", "description": "..."}
]
# LLM outputs tool call
{"tool": "calculator", "input": {"expression": "2+2"}}
4. Tree of Thoughts (ToT)
Explores multiple reasoning paths
Root (Problem)
/ | \
Path1 Path2 Path3
/ | \
Success Dead-end Promising
Useful for problems with multiple solutions
5. Hierarchical Agents
Higher-level agent delegates to sub-agents
Top Agent (Planner)
├─ Sub-Agent 1 (Document search)
├─ Sub-Agent 2 (Data analysis)
└─ Sub-Agent 3 (Report generation)
Benefits:
- Specialization
- Scalability
- Cleaner reasoning
6. Reflexion
Agent reviews its own work and improves
Act → Observe → Reflect → Improve → Act
Process:
- Execute plan
- Check results
- Identify mistakes
- Self-correct
- Retry
Example:
Plan: Search for "Python debugging"
Result: Got generic results
Reflection: Need more specific query
Improved Plan: Search "Python debugger pdb tutorial"
Better Result: Specific tutorial found
7. AutoGPT-Style
Full autonomy to create and execute tasks
Goal → Create Task List → Execute Tasks → Verify → Iterate
Process:
- Break goal into subtasks
- Execute each subtask
- Verify completion
- Create new tasks based on progress
Example:
Goal: "Build a web scraper for price comparison"
Subtasks:
- Research popular scraping libraries
- Design data structure
- Write scraper code
- Test on sample websites
- Deploy
Agent executes each, creates new subtasks as needed
Choosing Architecture:
| Task Type | Best Architecture |
|---|---|
| Simple reasoning | Chain-of-Thought |
| Tool use | ReAct |
| Complex planning | Hierarchical agents |
| Self-improvement | Reflexion |
| Multiple solutions | Tree of Thoughts |
| Full autonomy | AutoGPT-style |
Q2: What is the ReAct Framework?
Answer:
ReAct = Reasoning + Acting
Interleaves LLM reasoning with action execution
Core Principle: Let LLM think through problems while taking actions, rather than pure reasoning or pure acting
Format:
Thought: [LLM analyzes situation]
Action: [LLM chooses and executes action]
Observation: [System provides result]
[Repeat until done]
Final Answer: [LLM provides answer]
Why ReAct Works:
- Interpretability: Can see thinking process
- Error Correction: Observes feedback and adjusts
- Flexibility: Mix reasoning and acting optimally
- Robustness: Can recover from mistakes
Example Walkthrough:
User: "What is the capital of France and what's its current weather?"
Agent:
Thought: I need to find France's capital and its current weather
Action: search("capital of France")
Observation: Paris is the capital of France
Thought: Now I need current weather in Paris
Action: get_weather("Paris")
Observation: Temperature 8°C, Cloudy, Humidity 65%
Thought: I have all information needed
Final Answer: The capital of France is Paris.
Current weather: 8°C, cloudy skies, 65% humidity.
Prompt Template:
You are a helpful assistant that answers questions using reasoning and tools.
Use this format:
Thought: Do I need to use a tool? Is my previous answer final?
Action: the action to take, should be one of [search, calculator, code_executor]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Question: {user_question}
Thought:
Key Characteristics:
- Transparent: Every step visible
- Debuggable: Can trace reasoning failures
- Adaptable: Observes and adjusts
- Scalable: Works for complex multi-step tasks
Limitations:
- Token overhead (showing all steps increases tokens)
- Token limit constraints (long reasoning chains)
- Hallucination still possible
- May be slow (multiple LLM calls)
Improvements to ReAct:
1. Chain-of-Thought ReAct: More reasoning before action
Thought: Let me think step by step...
1. First I need...
2. Then I should...
3. Finally I'll...
Action: [Most promising action first]
2. Critic-Assisted ReAct: Add critic to evaluate decisions
Action: [Proposed action]
Critic: Is this best? (yes/no/suggestion)
If no: Consider alternative
Action: [Revised action]
3. Memory-Augmented ReAct: Include relevant past experiences
Relevant past: [Similar problems solved before]
Thought: [Leverage past experience]
Action: [Informed action]
Tool Use & Function Calling
Q3: How do tools and function calling work in agents?
Answer:
Function Calling Definition: Agent uses LLM to decide which pre-defined functions to call with specific arguments.
Traditional Approach:
Hardcoded:
if user_asks_weather:
result = get_weather(location)
elif user_asks_calculation:
result = calculator.compute(expression)
Issues:
- Limited flexibility
- Hard to scale
- Brittle (exact phrases needed)
Modern Function Calling:
LLM decides function and arguments:
# Define tools
tools = [
{
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
},
{
"name": "search",
"description": "Search for information",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"}
}
}
}
]
# User asks
user_input = "What's the weather in Paris?"
# LLM decides
# Output: {"tool": "get_weather", "arguments": {"location": "Paris"}}
# Execute
result = get_weather("Paris")
Benefits:
- Flexibility: LLM chooses appropriate tool
- Scalability: Add new tools easily
- Natural Language: Understand user intent
- Composability: Chain multiple tools
Implementation:
# 1. Call LLM with tools
response = llm.complete(
user_message,
tools=tools,
system_prompt="You have access to these tools..."
)
# 2. Parse response
if response.has_tool_call():
tool_name = response.tool_name
arguments = response.arguments
# 3. Execute tool
if tool_name == "get_weather":
result = get_weather(**arguments)
elif tool_name == "search":
result = search(**arguments)
# 4. Add to context
messages.append({"role": "assistant", "content": response})
messages.append({"role": "user", "content": f"Tool result: {result}"})
# 5. Continue conversation
response = llm.complete(messages, tools=tools)
Tool Categories:
1. Computation Tools
- Calculator: Math operations
- Code executor: Run Python code
- Data analyzer: Process data
2. Knowledge Tools
- Web search: Find information
- Database query: Access structured data
- API calls: External services
3. Creation Tools
- Text generator: Create content
- Image generator: Create images
- Code generator: Write code
4. Verification Tools
- Fact checker: Verify information
- Validator: Check format/correctness
- Simulator: Test scenarios
Tool Design Best Practices:
- Clear Names: “search_academic_papers” vs “search”
- Good Descriptions: Explain when to use
- Simple Interfaces: Easy arguments
- Reliable: Consistent results
- Fast: Quick execution
- Specific: Domain-focused is better
Example Tool Definition:
{
"name": "search_academic_papers",
"description": "Search for academic papers by keyword, returns title, authors, abstract, and publication date",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query terms (e.g., 'machine learning optimization')"
},
"num_results": {
"type": "integer",
"description": "Number of papers to return (default: 5, max: 20)"
},
"year_from": {
"type": "integer",
"description": "Filter papers from this year onwards (optional)"
}
},
"required": ["query"]
}
}
Common Issues & Solutions:
| Issue | Cause | Solution |
|---|---|---|
| Wrong tool | Poor description | Clearer description, examples |
| Wrong arguments | Unclear parameters | Type hints, examples in prompt |
| Tool not found | Tool not listed | Ensure tool in tools list |
| Hallucinated tools | LLM invents tools | Strict validation before execution |
| Loops/infinite calls | No stopping condition | Max tool calls, explicit finish |
Tool Composition:
Chain multiple tools:
User: "Find latest AI papers and summarize trends"
Agent:
Action 1: search_papers(query="AI trends 2024")
Observation: [Papers list]
Action 2: summarize_content([papers])
Observation: [Summary]
Final Answer: [Trends summary]
Planning & Reasoning
Q4: How do agents plan and reason through complex problems?
Answer:
Planning Definition: Breaking down goals into sub-goals and tasks, determining execution order and dependencies
Simple Planning:
Decompose goal:
Goal: Write a research paper
├─ Research topic
├─ Create outline
├─ Write sections
├─ Review and edit
└─ Format and submit
Linear execution No dependencies
Advanced Planning:
1. Task Decomposition
Break goal into independent tasks:
goal = "Analyze customer feedback for product improvement"
tasks = [
{
"id": 1,
"title": "Collect feedback",
"description": "Gather customer reviews from all platforms",
"tools": ["search_reviews", "api_call"],
"dependencies": []
},
{
"id": 2,
"title": "Categorize feedback",
"description": "Classify feedback by topic",
"tools": ["text_classifier"],
"dependencies": [1] # Depends on task 1
},
{
"id": 3,
"title": "Generate insights",
"description": "Extract actionable insights",
"tools": ["analyzer"],
"dependencies": [2] # Depends on task 2
}
]
# Execute in order respecting dependencies
2. Goal Decomposition (Hierarchical Planning)
Break big goal into sub-goals:
Goal: Build an AI chatbot
├─ Research Phase
│ ├─ Study LLM architectures
│ ├─ Review chatbot frameworks
│ └─ Analyze use cases
│
├─ Design Phase
│ ├─ Define capabilities
│ ├─ Plan dialog flow
│ └─ Design knowledge base
│
├─ Implementation Phase
│ ├─ Set up infrastructure
│ ├─ Integrate LLM
│ ├─ Build training data
│ └─ Fine-tune model
│
└─ Deployment Phase
├─ Test thoroughly
├─ Deploy to cloud
└─ Monitor performance
Each sub-goal can be tackled by specialized agent
3. Multi-Step Reasoning (Chain of Thought)
Break problem into logical steps:
Problem: "If investment returns 8% annually, how long to double money?"
Step 1: Identify rule (Rule of 72)
Step 2: Calculate (72 / 8 = 9)
Step 3: Verify answer (reasonable)
Step 4: Explain (approximately 9 years)
4. Constraint-Based Planning
Optimize while respecting constraints:
Goal: Schedule meetings for team
Constraints:
- Each person available 2 hours/week
- All 4 people must attend some meetings
- Meetings max 2 hours
- Must not conflict with existing calendar
Solution: [Generated schedule respecting all constraints]
5. Adaptive Planning
Adjust plan based on observations:
Initial Plan:
├─ Search for flights
├─ Book flight
└─ Reserve hotel
Execution:
├─ ✓ Searched flights
├─ ✗ No availability on preferred date
→ Adapt: Search nearby dates
→ Observation: Found cheaper flights 1 day earlier
→ Replan: Book earlier date
└─ Reserve hotel for new dates
Reasoning Strategies:
Forward Chaining: Start from known facts, derive new facts
Facts: A is true, A→B, B→C
Forward chain:
A true → B true (from rule A→B)
B true → C true (from rule B→C)
Conclusion: C is true
Use case: Inferring properties from data
Backward Chaining: Start from goal, work backwards to find facts
Goal: Need C
Work backwards:
C requires B (from rule B→C)
B requires A (from rule A→B)
Find: A is available
Conclusion: Can derive C from A
Use case: Finding solution path
Reasoning with Uncertainty:
Not all information certain:
Tool 1: Says "Weather is rainy" (confidence: 80%)
Tool 2: Says "Weather is clear" (confidence: 60%)
Aggregate: Most likely rainy
Confidence: Moderate (conflicting signals)
Action: "Weather probably rainy, but not certain"
Iterative Refinement:
Keep improving solution:
Iteration 1: Basic solution
Iteration 2: Validate assumptions
Iteration 3: Refine based on feedback
Iteration 4: Optimize performance
...
Final: High-quality solution
Complex Planning Example:
User: "Help me plan a 2-week vacation to Japan"
Agent Planning:
1. Gather information
- Budget constraints? (Tool: clarify with user)
- Interests? (Food, culture, nature?)
- Travel dates? (What's available?)
2. Create itinerary skeleton
- Days 1-3: Tokyo
- Days 4-6: Kyoto
- Days 7-9: Osaka
- Days 10-14: Flexible
3. Plan transportation
- International flights (to/from US?)
- Local transit (JR pass?)
- Accommodation (hotels, ryokan, airbnb?)
4. Activity planning
- Day 1 Tokyo: Arrival, rest
- Day 2 Tokyo: Shibuya, Senso-ji
- ...
5. Optimize
- Check prices
- Adjust for best value
- Ensure no conflicts
- Account for travel time
6. Verify
- All days planned? ✓
- Within budget? ✓
- All activities feasible? ✓
Final: Detailed day-by-day itinerary
Memory & State Management
Q5: How do agents manage memory and state?
Answer:
Memory Types:
1. Short-Term Memory (Context Window)
Current conversation context
User: "What's the weather?"
Assistant: "In which city?"
User: "Paris"
Assistant: [Uses context: "Paris" from previous]
Limited by context window (4K-200K tokens)
2. Long-Term Memory (Persistent)
Information retained across sessions
Session 1: User tells agent: "I'm allergic to peanuts"
Session 2: Agent remembers: "You mentioned peanut allergy"
Session N: Agent still recalls: "No peanut products for user"
Stored in database/file
3. Working Memory (Task-Specific)
Temporary state during task execution
Task: Calculate total expense
Working memory:
- Item 1: $50
- Item 2: $30
- Item 3: $25
- Total so far: $105
- Running state: ["50", "30", "25"]
Cleared after task completion
State Management:
Basic State:
agent_state = {
"conversation_id": "conv_123",
"user_id": "user_456",
"goal": "Book a flight",
"status": "in_progress",
"current_step": "searching_flights",
"context": {
"origin": "NYC",
"destination": "LA",
"date": "2025-02-20"
},
"results": [],
"errors": []
}
Execution State:
execution_state = {
"step_count": 0,
"tool_calls": [],
"observations": [],
"reasoning_trace": [],
"max_steps": 10,
"timeout": 300,
"memory_usage": "2.3GB"
}
Memory Strategies:
1. Summarization
Compress old conversations:
Original: 10K tokens of conversation
Summarize: 500 tokens summary
Store: Both (full + summary)
Retrieve: Use summary for context, full if needed
2. Retrieval-Augmented Memory
Store and retrieve relevant info:
Query: "How did we solve this before?"
Retrieve: Similar past solutions
Use: Adapt previous solution
Embedding-based retrieval:
Query embedding → Find similar memory embeddings
3. Hierarchical Memory
Different memory levels:
Level 1 (Recent): Last 5 interactions (high detail)
Level 2 (Short-term): Last 50 interactions (summaries)
Level 3 (Long-term): All history (indexed, compressed)
Retrieval strategy:
- Check Level 1 first
- If not found, check Level 2
- If not found, search Level 3
4. Episodic Memory
Remember specific events/episodes:
Episode 1: "User asked about ML in 2025-01-15"
Episode 2: "User booked flight to NYC in 2025-01-16"
Episode 3: "User complained about price in 2025-01-17"
Retrieve relevant episodes for current query
Memory Implementation:
class AgentMemory:
def __init__(self):
self.short_term = [] # Current context
self.long_term = {} # Persistent storage
self.working = {} # Task state
def add_short_term(self, message):
"""Add to current conversation"""
self.short_term.append(message)
if len(self.short_term) > MAX_CONTEXT:
# Summarize old messages
summary = self.summarize(self.short_term[:100])
self.long_term[f"summary_{time}"] = summary
self.short_term = self.short_term[100:]
def add_long_term(self, key, value):
"""Store permanent info"""
self.long_term[key] = value
def retrieve(self, query):
"""Find relevant memories"""
# Search short-term first
relevant = self.search_short_term(query)
if relevant:
return relevant
# Search long-term if needed
return self.search_long_term(query)
def get_context(self):
"""Get current context for LLM"""
return {
"recent": self.short_term[-10:],
"relevant_long_term": self.retrieve_relevant(),
"task_state": self.working
}
Challenges & Solutions:
| Challenge | Issue | Solution |
|---|---|---|
| Hallucination | Agent “remembers” false info | Verify against stored data |
| Stale info | Old knowledge becomes outdated | Periodic refresh, timestamps |
| Conflicting memories | Two sources say different | Conflict resolution, voting |
| Privacy | Remember sensitive data | Encryption, access control |
| Space | Memory grows unbounded | Pruning, archiving |
State Persistence Example:
# Save state for resumption
def save_agent_state(agent, conversation_id):
state = {
"conversation_id": conversation_id,
"memory": agent.memory.to_dict(),
"execution_state": agent.execution_state,
"timestamp": datetime.now(),
"goal": agent.current_goal
}
db.save(f"agent_{conversation_id}", state)
# Resume from checkpoint
def resume_agent(conversation_id):
state = db.load(f"agent_{conversation_id}")
agent = Agent()
agent.memory = AgentMemory.from_dict(state["memory"])
agent.execution_state = state["execution_state"]
agent.current_goal = state["goal"]
return agent
AI Agent Prerequisites
LLM fundamentals, transformers, and model mechanics
Before building agents, understand:
- what a transformer is
- how self-attention works
- tokenization and embeddings
- context windows and truncation
- token-based pricing
- inference vs training
- generation controls
- reasoning models vs standard models
- open-weight vs closed-weight models
- fine-tuning vs prompt engineering
- embeddings, vector search, and RAG
Interview Topic: Core prerequisites for building AI agents
Answer:
You need five layers of knowledge:
1. LLM Fundamentals
- Transformers and attention
- Tokenization
- Context windows
- Sampling and generation
- Instruction following limitations
2. Software Engineering Basics
- Python or another backend language
- REST APIs
- JSON schemas
- Error handling and retries
- Git, shell, and debugging
3. Data and Retrieval
- Embeddings
- Vector search
- Chunking
- Metadata filtering
- RAG design
4. Agent Systems
- tool calling
- memory
- planning loops
- evaluation
- observability
5. Production Concerns
- cost tracking
- rate limits
- prompt injection defense
- PII handling
- human oversight
Why this matters:
- Most agent failures are not model failures alone.
- They come from weak system design around tools, data, context, and recovery.
Interview Topic: Tokenization, context windows, and token-based pricing
Answer:
Tokenization:
- Models process tokens, not raw words.
- Tokens may be whole words, subwords, punctuation, or bytes depending on tokenizer.
- Cost, latency, and context usage are all token-based.
Context Window:
- The context window is the total amount of input plus generated output the model can handle in a request.
- If context is too large, the system must truncate, summarize, retrieve selectively, or fail.
Token-Based Pricing:
- Most model providers charge separately for input tokens and output tokens.
- Cached input is often cheaper than fresh input.
- Long prompts, large tool schemas, chain-of-thought verbosity, and repeated retrieval all increase spend.
Interview answer:
- Tokenization affects cost and chunking.
- Context windows affect what the model can remember in one call.
- Pricing determines whether an agent is economically viable in production.
Interview Topic: Generation controls and why they matter
Answer:
Important controls:
- Temperature: controls randomness
- Top-p: nucleus sampling; restricts token choices to the most likely cumulative mass
- Frequency penalty: discourages repeating the same tokens frequently
- Presence penalty: encourages introducing new concepts rather than repeating existing ones
- Max length / max tokens: caps response size
- Stop sequences / stopping criteria: tell the model where to stop generation
How to use them:
- Lower temperature for deterministic tool calling and extraction
- Higher temperature for ideation and creative generation
- Use max output limits to control cost and latency
- Use stop sequences for structured multi-step workflows when needed
Interview Topic: Open-weight vs closed-weight, reasoning vs standard models
Answer:
Open-weight models:
- You can download and run model weights yourself
- More control, deployment flexibility, and customization
- More operational burden
Closed-weight models:
- Provider-hosted APIs
- Easier to use
- Better managed infrastructure and often stronger proprietary performance
- Less transparency and portability
Reasoning models:
- Spend more compute or internal reasoning effort before answering
- Better for planning, math, code, and long multi-step tasks
- Usually slower and more expensive
Standard models:
- Faster and cheaper
- Better for simple transforms, chat, extraction, or routing
Interview answer:
- Choose open vs closed based on control, privacy, deployment constraints, and ops maturity.
- Choose reasoning vs standard based on task difficulty and cost tolerance.
Interview Topic: Streamed vs unstreamed responses
Answer:
Streamed responses:
- tokens arrive incrementally
- better UX for chat and interactive systems
- useful when long outputs would otherwise feel slow
Unstreamed responses:
- response arrives only after generation finishes
- simpler for backends, pipelines, and validation-first flows
Use streaming when:
- users are waiting interactively
- partial output is valuable
Avoid streaming when:
- downstream systems need the complete structured output before acting
- you need strict validation before display
Interview Topic: Fine-tuning vs prompt engineering
Answer:
Prompt engineering changes behavior by changing instructions and examples.
Fine-tuning changes model weights using training data.
Use prompt engineering when:
- tasks change often
- you need faster iteration
- format and behavior can be guided with instructions and examples
Use fine-tuning when:
- you need persistent domain behavior
- style and format must be highly consistent
- prompts alone are too long, costly, or brittle
Interview answer:
- Start with prompt engineering, retrieval, and tool design first.
- Fine-tune only when repeated evidence shows prompting is not enough.
Interview Topic: Embeddings, vector search, and RAG
Answer:
Embeddings:
- Dense numeric representations of text, code, images, or other data
- Similar items end up close in vector space
Vector Search:
- Find nearest neighbors to a query embedding
- Used for semantic retrieval rather than exact keyword matching
RAG:
- Retrieval-Augmented Generation
- Retrieve relevant external knowledge, then feed it into the model for grounded generation
Basic RAG flow:
- ingest data
- chunk it
- create embeddings
- store them in a vector database
- retrieve relevant chunks at query time
- generate an answer using retrieved context
Interview answer:
- RAG is often the first reliability upgrade before fine-tuning.
- Good retrieval quality depends on chunking, embeddings, filtering, reranking, and citation discipline.
Pricing of common models
Pricing changes frequently, so in interviews focus on pricing structure and tradeoffs, then cite official pricing pages for exact numbers.
Useful current references:
- OpenAI API pricing: https://openai.com/api/pricing/
- Anthropic pricing overview: https://docs.anthropic.com/en/docs/about-claude/pricing
- Gemini API pricing: https://ai.google.dev/gemini-api/docs/pricing?hl=en
As of the retrieved official pricing pages:
- OpenAI’s API pricing page lists per-million-token rates with separate input, cached input, and output pricing.
- Anthropic’s pricing page lists model-specific input, cache, and output pricing for Claude families.
- Google’s Gemini pricing page distinguishes free and paid tiers and notes discounted batch pricing.
Prompt Engineering for Agents
What is prompt engineering?
Prompt engineering is the discipline of shaping model behavior using instructions, examples, structure, and constraints.
Writing good prompts for agents
Best practices:
- be specific about the task
- provide relevant context
- use precise technical terms
- include examples when format matters
- specify output length and structure
- define success criteria
- iterate and test prompts
Agent-specific prompting tips:
- tell the agent when to use tools
- specify when it must ask clarifying questions
- define output schemas for tools and final answers
- state constraints like latency, cost, or safety requirements
Interview Topic: What makes a strong prompt for an AI agent?
Answer:
A strong agent prompt usually contains:
- role and objective
- available tools and when to use them
- constraints
- output schema
- examples
- failure behavior
Example:
You are a support agent. Use the CRM lookup tool before answering account-specific questions.
Never guess billing status.
If the lookup fails, ask the user to verify account ID.
Return JSON with fields: issue_type, answer, follow_up_needed.
Why this works:
- reduces hallucinations
- clarifies tool usage
- improves evaluation
- keeps downstream integration stable
Building Agents & Frameworks
AI Agents 101
Core loop:
- Perception / user input
- Reason and plan
- Acting / tool invocation
- Observation and reflection
Example use cases:
- personal assistant
- code generation
- data analysis
- web scraping and crawling
- NPC or game AI
Tools / Actions
What is a tool?
- A callable capability the agent can invoke to affect the outside world or gather information.
Good tool definition includes:
- name and description
- input schema
- output schema
- error handling
- usage examples
Common tools:
- web search
- code execution / REPL
- database queries
- API requests
- email / Slack / SMS
- file system access
Agent memory
What is agent memory?
- Mechanisms that help the agent retain useful state within a task or across tasks.
Types:
- short-term memory: conversation or current working context
- long-term memory: persistent user or system knowledge
- within-prompt memory: facts included directly in the prompt
- external memory: vector DB, SQL, graph DB, key-value store, or custom system
Maintaining memory:
- episodic memory stores prior events and interactions
- semantic memory stores facts, profiles, and stable knowledge
- summarization and compression reduce token load
- forgetting or aging strategies remove stale or low-value memory
Architecture patterns to know
- ReAct
- Chain of Thought
- Tree-of-Thought
- Planner-Executor
- DAG agents
- RAG agent
- MCP-based agent systems
Model Context Protocol (MCP)
Core MCP pieces:
- MCP host: the application running the model experience
- MCP client: the component that speaks the MCP protocol
- MCP server: the external process or service exposing tools and resources
Creating MCP servers generally requires:
- tool definitions
- resource definitions
- request handling
- auth and permission model
- deployment plan
Deployment modes:
- local desktop
- remote or cloud
Building agents manually vs using frameworks
Manual from scratch
- direct LLM API calls
- implement the agent loop yourself
- parse model output yourself
- handle retries, rate limits, and validation manually
Benefits:
- maximum control
- fewer abstractions
- easier to optimize for a narrow use case
Costs:
- more boilerplate
- more maintenance
Framework-based
- LangChain
- LlamaIndex
- Haystack
- AutoGen
- CrewAI
- Smolagents
- provider-native tool use such as Anthropic tool use or OpenAI function calling
Benefits:
- faster prototyping
- reusable components
- integrations and observability helpers
Costs:
- abstraction overhead
- debugging complexity
- framework lock-in
Interview Topic: Choosing between manual implementation and frameworks
Answer:
Choose manual implementation when:
- the workflow is narrow and critical
- latency matters
- you want strict control over prompts, retries, memory, and tool execution
Choose a framework when:
- you need to prototype quickly
- you want built-in retrieval, agents, tracing, or orchestration
- the team values speed over perfect control in early stages
Interview answer:
- Start as simple as possible.
- Frameworks should remove toil, not hide system behavior you still need to understand.
Multi-Agent Systems
Q6: How do multi-agent systems work?
Answer:
Multi-Agent System Definition: Multiple independent agents working together toward shared or related goals, communicating and coordinating.
Why Multi-Agent?
Benefits:
- Specialization: Each agent expert in domain
- Scalability: Handle larger problems
- Robustness: If one fails, others continue
- Efficiency: Parallel execution
- Flexibility: Mix and match agents
Agent Roles:
Specialist Agents:
Data Agent: Retrieves and processes data
Analysis Agent: Performs analysis
Writing Agent: Creates reports
Review Agent: Checks quality
Hierarchical Structure:
Coordinator/Manager Agent
├─ Research Agent
├─ Analysis Agent
├─ Writing Agent
└─ Review Agent
Manager coordinates all agents
Communication Patterns:
1. Direct Communication
Agent A → Message → Agent B
Agent B → Response → Agent A
2. Broadcast
Manager → Message → All agents
Agents → Responses → Manager
3. Sequential Handoff
Agent 1 (input) →
Agent 2 (process Agent 1 output) →
Agent 3 (finalize)
4. Message Queue
Agent A → Queue → Agent B reads
Agent B → Queue → Agent C reads
Decoupled communication
Example: Report Writing System
Goal: Create a market analysis report
Agents:
1. Researcher Agent
- Searches market data
- Compiles statistics
2. Analyzer Agent
- Analyzes trends
- Identifies patterns
3. Writer Agent
- Writes report sections
- Ensures coherence
4. Reviewer Agent
- Checks facts
- Improves writing
Workflow:
Researcher → [Market data]
↓
Analyzer → [Insights]
↓
Writer → [Draft]
↓
Reviewer → [Final Report]
Coordination Mechanisms:
1. Central Coordinator
Manager collects info from all agents
Manager decides next steps
Agents execute manager decisions
Pros: Clear control, consistency
Cons: Bottleneck, less autonomy
2. Consensus
Agents vote on decision
Majority wins or unanimous required
Pros: Collaborative, balanced
Cons: Slow, deadlock possible
3. Auction
Manager broadcasts task
Agents bid (based on capability)
Lowest cost agent wins task
Pros: Efficient, scalable
Cons: Complexity
4. Contract Negotiation
Agent A needs help with X
Agent B offers help at cost Y
Negotiate terms
Execute agreement
Pros: Flexible, balanced
Cons: Complex negotiations
Conflict Resolution:
Agents may disagree:
Agent A: "Price will rise" (confidence: 80%)
Agent B: "Price will fall" (confidence: 70%)
Resolution strategies:
1. Weighted average: Combine both opinions
2. Voting: Higher confidence wins
3. Authority: Senior agent decides
4. Escalation: Ask manager/human
Information Sharing:
class SharedKnowledgeBase:
def __init__(self):
self.facts = {} # Shared data
self.lock = threading.Lock()
def add_fact(self, agent_id, key, value):
"""Add discovered fact"""
with self.lock:
self.facts[key] = {
"value": value,
"source": agent_id,
"timestamp": time.time()
}
def query(self, key):
"""Retrieve fact"""
return self.facts.get(key)
def search(self, query):
"""Search facts"""
results = []
for key, data in self.facts.items():
if query.lower() in key.lower():
results.append(data)
return results
Scalability Considerations:
Small system (3-5 agents):
- Direct communication fine
- Simple coordinator
Medium system (5-20 agents):
- Message queue
- Publish-subscribe
Large system (20+ agents):
- Distributed architecture
- Multiple coordinators
- Hierarchical organization
Example: Software Development Team
Coordinator: Project Manager
Agents:
1. Requirements Agent
- Clarifies requirements
- Documents specs
- Output: Requirements doc
2. Design Agent
- Creates architecture
- Plans database
- Output: Design doc
3. Development Agent
- Writes code
- Follows design
- Output: Code
4. Testing Agent
- Writes tests
- Runs tests
- Output: Test results
5. Documentation Agent
- Writes guides
- Creates API docs
- Output: Documentation
Workflow:
Requirements → Design → Development → Testing → Documentation → Release
Agent Evaluation & Testing
Q7: How do you evaluate and test agents?
Answer:
Evaluation Dimensions:
1. Task Success Rate
Percentage of tasks completed successfully
50 tasks assigned
40 completed correctly
Success rate: 80%
Breakdown by task type:
- Simple tasks: 95%
- Complex tasks: 60%
2. Efficiency
How many steps/tokens needed
Task: Book flight
Optimal steps: 4 (search → select → book → confirm)
Agent steps: 6 (extra: clarification, retry)
Efficiency: 67%
Token usage:
- Baseline: 500 tokens
- Agent used: 450 tokens
- Savings: 10%
3. Accuracy
Correctness of decisions/outputs
100 decisions made
98 were correct
Accuracy: 98%
Hallucinations detected: 2
False refusals: 0
4. Latency
Time to complete task
Task 1: 2 seconds
Task 2: 5 seconds
Task 3: 1 second
Average: 2.67 seconds
Acceptable: < 5 seconds
Status: ✓ Meets requirement
5. Cost
Resources used
Per task cost:
- LLM tokens: $0.05
- Tool calls: $0.02
- Compute: $0.03
- Total: $0.10 per task
6. Robustness
Handling edge cases and errors
Tests:
- Invalid input: ✓ Handled gracefully
- Missing data: ✓ Requests clarification
- Network error: ✓ Retries
- Unknown tool: ✓ Asks for help
- Timeout: ✓ Graceful degradation
Testing Strategies:
Unit Testing
Test individual components:
def test_tool_weather():
"""Test weather tool"""
result = get_weather("Paris")
assert result["temperature"] is not None
assert result["condition"] in ["sunny", "cloudy", "rainy"]
def test_agent_initialization():
"""Test agent setup"""
agent = Agent()
assert agent.tools is not None
assert agent.memory is not None
assert agent.llm is not None
Integration Testing
Test agent + tools together:
def test_agent_with_tools():
"""Test full agent with tools"""
agent = Agent(tools=[search, calculator])
result = agent.run("What's 50% of 200?")
assert "100" in result or "100.0" in result
result = agent.run("Weather in NYC?")
assert "temperature" in result.lower()
End-to-End Testing
Test complete user scenarios:
def test_booking_scenario():
"""Test flight booking scenario"""
agent = Agent()
# Simulate user
agent.handle("Book a flight from NYC to LA")
agent.handle("Next Friday")
agent.handle("Economy class")
# Check results
assert agent.flight_booked is True
assert agent.destination == "LA"
assert agent.date == "next_friday"
Evaluation Metrics:
Task-Specific Metrics:
For summarization:
ROUGE score: Compare with reference
Factuality: Check against source
Conciseness: Token ratio
For code generation:
Pass@K: Does generated code work?
Compilation: No syntax errors
Tests: Pass predefined test suite
For reasoning:
Explanation clarity: Human rated
Step correctness: Each step valid
Conclusion soundness: Logical conclusion
Benchmark Datasets:
Tool use benchmarks:
- API-Bank: Function calling
- ToolBench: Tool use
- ReAct benchmark: Reasoning + acting
Planning benchmarks:
- BlocksWorld: Task planning
- SOKOBAN: Problem solving
Multi-agent:
- DAWN: Dialogue planning
- TOD: Task-oriented dialogue
Human Evaluation:
class HumanEvaluation:
metrics = {
"correctness": "Is output correct?",
"helpfulness": "Does it help user?",
"clarity": "Is it clear?",
"completeness": "Does it address all aspects?",
"safety": "Is it safe?"
}
rating_scale = 1-5 # Likert scale
inter_rater_agreement = "Cohen's Kappa"
process:
1. Sample 100 outputs
2. Get 3 human raters each
3. Rate on metrics
4. Calculate agreement
5. Investigate disagreements
6. Produce final score
Failure Analysis:
Categorize failures:
Failure Types:
1. Tool error (30%): Tool returned wrong result
2. Reasoning error (40%): Bad decision/logic
3. Hallucination (20%): Made up information
4. Tool selection (10%): Wrong tool chosen
Action items:
- Tool error: Fix tool
- Reasoning error: Better prompting
- Hallucination: Add grounding
- Tool selection: Better descriptions
Ablation Studies:
Remove components, measure impact:
Baseline: Full agent (80% success)
- No memory: 75% success (5% impact)
- No chain-of-thought: 70% success (10% impact)
- No tool use: 40% success (40% impact)
Conclusion: Tool use most important
Monitoring in Production:
def monitor_agent():
metrics = {
"success_rate": calculate_success_rate(),
"avg_latency": calculate_avg_latency(),
"error_rate": calculate_error_rate(),
"cost_per_task": calculate_cost(),
"hallucination_rate": detect_hallucinations()
}
# Alert if metrics degrade
if metrics["success_rate"] < 0.75:
alert("Success rate dropped below 75%")
# Track trends
log_metrics(metrics, timestamp=now())
Observability, Security & Ethics
Debugging and monitoring
Production agents need:
- structured logging
- tracing
- request IDs
- tool-call audit trails
- latency and token metrics
- failure categorization
Useful observability tools:
- LangSmith
- Helicone
- Langfuse
- OpenLIT or OpenTelemetry-based tracing
Evaluation and testing stack
Metrics to track:
- task completion rate
- tool success rate
- groundedness / hallucination rate
- retrieval precision
- latency
- cost per task
- user satisfaction
- escalation rate
Testing layers:
- unit testing for individual tools
- integration testing for full flows
- human-in-the-loop evaluation
- regression test sets
Common evaluation tools:
- LangSmith
- Ragas
- DeepEval
Security & ethics topics
You should be ready to discuss:
- prompt injection and jailbreaks
- tool sandboxing and permissioning
- data privacy and PII redaction
- bias and toxicity guardrails
- safety and red-team testing
Interview Topic: Securing and monitoring an AI agent in production
Answer:
Security controls:
- validate tool inputs and outputs
- sandbox code execution tools
- limit file system and network permissions
- redact sensitive data
- separate user instructions from retrieved content
- require human approval for high-risk actions
Monitoring controls:
- log every tool call
- trace multi-step flows
- track latency, token usage, and failure modes
- evaluate drift and regressions continuously
Interview answer:
- Secure agent design is not only about the model.
- It is about the surrounding system: permissions, validation, observability, and escalation paths.
Real-World Applications
Q8: What are real-world applications of agentic AI?
Answer:
1. Customer Service
24/7 automated support with escalation
Scenario: Customer calls with issue
Agent:
1. Understands problem
2. Checks knowledge base
3. Provides solution
4. If complex: Escalates to human
5. Follows up
Benefits:
- Instant response (no wait)
- Available 24/7
- Consistent service
- Reduces cost by 60%
2. Data Analysis & Reporting
Automated insights from data
Task: Analyze sales trends
Agent:
1. Retrieves sales data
2. Cleans and processes
3. Identifies patterns
4. Creates visualizations
5. Writes report
6. Sends to stakeholders
Replaces: Manual analysis (days) → Automated (minutes)
3. Code Development
AI pair programmer
Task: Implement feature X
Agent:
1. Understands requirements
2. Designs solution
3. Generates code
4. Writes tests
5. Debugs and refines
6. Creates documentation
Productivity: 2-3× faster than manual
Quality: Better error handling and testing
4. Research & Literature Review
Automated research assistance
Task: Literature review on AI safety
Agent:
1. Searches academic databases
2. Identifies relevant papers
3. Summarizes key findings
4. Organizes by theme
5. Creates comprehensive report
6. Highlights gaps
Traditional: Weeks of work
Automated: Hours
5. Personal Assistant
Manages tasks and schedules
Capabilities:
- Schedule meetings
- Send emails
- Research topics
- Summarize documents
- Manage calendar
- Make reservations
- Process expenses
Result: Saves 5-10 hours/week per person
6. E-commerce & Recommendations
Intelligent shopping assistance
User: "I need a laptop under $1000 for programming"
Agent:
1. Searches product inventory
2. Filters by specs and price
3. Compares reviews
4. Checks availability
5. Shows best options
6. Handles purchase
Better than: Static filters or generic recs
7. Financial Analysis & Trading
Market analysis and decisions
Task: Find trading opportunities
Agent:
1. Monitors market data
2. Analyzes trends
3. Evaluates risk/reward
4. Makes recommendations
5. Executes trades (if allowed)
Risk management: AI can spot patterns faster
8. Project Management
Autonomous project orchestration
Task: Execute marketing campaign
Agent:
1. Breaks into tasks
2. Assigns to team
3. Monitors progress
4. Identifies blockers
5. Reallocates resources
6. Reports status
Benefits: Better coordination, faster completion
9. Quality Assurance & Testing
Automated testing and bug finding
Task: Test software
Agent:
1. Creates test cases
2. Runs tests
3. Logs bugs
4. Prioritizes issues
5. Retests fixes
6. Creates test report
Improves: Coverage, finds edge cases
10. Recruitment & HR
Resume screening and interviewing
Task: Hire software engineer
Agent:
1. Screens resumes
2. Assesses qualifications
3. Schedules interviews
4. Conducts technical interview
5. Evaluates candidates
6. Recommends hiring decision
Saves: 80% of recruiter time
Implementation Examples:
Example 1: Expense Management
User expense: "Coffee at Starbucks - $5.50"
Agent process:
1. Detect: Coffee = Food/Beverage
2. Classify: Meals, not entertainment
3. Check: Within daily limit? Yes
4. Store: Add to expense report
5. Notify: "Added $5.50 to Meals"
Monthly:
Agent automatically:
- Sums by category
- Flags unusual spending
- Creates report
- Submits for approval
Example 2: Customer Support
Customer: "My order hasn't arrived"
Agent:
Thought: Customer has delivery issue
Action: search_order("customer_id_123")
Observation: Order ID#456, shipped 5 days ago, tracking available
Thought: Need to check tracking
Action: get_tracking("456")
Observation: Expected delivery tomorrow, on track
Final: "Your order is on track and should arrive tomorrow!"
If delayed:
- Agent would offer refund or replacement
- Escalate if needed
Challenges & Solutions
Q9: What are key challenges with agentic AI?
Answer:
1. Hallucinations
Agent generates false information confidently
User: "Who won the 2024 Olympics?"
Agent: "USA with 150 medals" [MADE UP]
Causes:
- LLM patterns over knowledge
- Limited grounding
- Confidence without verification
Solutions:
1. Fact-checking tool: Verify claims
2. RAG: Ground in real data
3. Confidence scoring: Only state confident answers
4. Fine-tuning: Train on factual data
5. Human-in-the-loop: Review before responding
2. Goal Misalignment
Agent pursues goals in unintended way
Goal: "Maximize user satisfaction"
Agent: Only tells user what they want to hear (hallucinations)
Goal: "Complete task quickly"
Agent: Skips important steps for speed
Solutions:
1. Specify constraints: "Must be truthful"
2. Multi-objective: Balance speed + quality
3. Value alignment: RLHF to match values
4. Oversight: Human monitoring
5. Testing: Adversarial testing
3. Tool Misuse
Agent uses wrong tool or misuses correctly
Wrong tool selection:
User: "What's the weather?"
Agent calls: search_internet (slow, inaccurate)
Should call: get_weather (direct, accurate)
Misuse:
User: "Delete my account"
Agent: Without confirmation, deletes account
Should: Ask for confirmation first
Solutions:
1. Clear tool descriptions
2. Usage examples
3. Constraints (requires confirmation)
4. Tool validation
5. Permission checks
4. Infinite Loops
Agent gets stuck repeating actions
Agent tries same tool repeatedly
Tool fails → Agent retries → Tool fails → ...
No progress made
Causes:
- No stopping condition
- Error not detected
- Wrong understanding of situation
Solutions:
max_iterations = 10
iteration_count = 0
while not goal_achieved and iteration_count < max_iterations:
thought = agent.think()
action = agent.decide_action(thought)
observation = execute(action)
iteration_count += 1
if iteration_count >= max_iterations:
agent.ask_human_help()
5. Context Length Limitations
Can’t process very long information
Document: 50 pages
Context window: 8K tokens (~5 pages)
Agent can't see full document
Result: Misses important context
Solutions:
1. Summarization: Compress important parts
2. Chunking: Process pieces separately
3. Retrieval: Search specific sections
4. Hierarchical: Multi-agent approach
5. Better models: Longer context windows
6. Latency Issues
Multiple tool calls slow down execution
Task needs 5 tool calls
Each call: 2 seconds
Total: 10 seconds (too slow for real-time)
Solutions:
1. Parallel execution: Call multiple tools at once
2. Caching: Reuse recent results
3. Model optimization: Faster LLM
4. Tool optimization: Faster tools
5. Hybrid: Simple path + detailed path
7. Cost Explosion
Large agents using many tokens
Single task: 10K tokens
Cost per 1M: $1
Cost per task: $0.01
1M tasks/month: $10K/month
Scale issues: Can become expensive
Solutions:
1. More efficient prompts: Shorter tokens
2. Smaller models: Cheaper inference
3. Caching: Reuse computations
4. Batching: Process multiple together
5. Tiering: Use cheap model first, upgrade if needed
8. Unpredictable Behavior
LLMs introduce randomness
Identical input:
Run 1: "I can do this"
Run 2: "I cannot do this"
Makes debugging hard
Solutions:
1. Lower temperature: More deterministic
2. Reproducible seed: Same randomness
3. Testing: Multiple runs
4. Guardrails: Constrain output
5. Fallbacks: Plan for failures
9. Poor Tool Integration
Tools and agent don’t work well together
Tool expects: {"date": "YYYY-MM-DD"}
Agent provides: "next Friday"
Tool fails: Invalid format
Solutions:
1. Type hints: Clear data types
2. Validation: Check before calling
3. Adaptation: Convert formats
4. Error messages: Clear feedback
5. Testing: Verify tool integration
10. Monitoring & Debugging
Hard to understand why agent failed
Agent task failed
Why?
- LLM decision wrong?
- Tool returned bad data?
- Reasoning error?
- Input misunderstood?
Answer: Unclear, need to investigate
Solutions:
# Detailed logging
logger.debug(f"Thought: {thought}")
logger.debug(f"Action: {action}")
logger.debug(f"Observation: {observation}")
logger.debug(f"Confidence: {confidence}")
# Error tracking
try:
execute()
except Exception as e:
logger.error(f"Error: {e}", exc_info=True)
capture_state(state)
# Debugging UI
Show thinking process
Show tool calls
Show observations
Allow step-by-step replay
Interview Questions & Answers
Q10: Explain the difference between an agent and a chatbot
Answer:
Chatbot:
Pattern matching → Response
Input → Process → Output
Stateless (typically)
No real action
User asks → Bot answers
Example:
User: "What's the weather?"
Bot: "I'm just a chatbot. I can't check weather. Ask a weather app."
Agent:
Goal → Plan → Act → Observe → Reason → Loop
Stateful (remembers context)
Takes real actions via tools
Pursues goals autonomously
User: "Book me a flight"
Agent: Plans, searches flights, compares prices, books
Comparison Table:
| Aspect | Chatbot | Agent |
|---|---|---|
| Interaction | Reactive (responds to user) | Proactive (pursues goals) |
| Action | Talks only | Takes real actions |
| State | Stateless | Maintains state |
| Goals | Answer queries | Accomplish tasks |
| Memory | Limited | Persistent |
| Autonomy | Limited | High |
| Tools | Information retrieval | Tools + APIs |
| Task Completion | User manually does | Agent automates |
Example Scenario:
User: “I want to book a flight to Paris”
Chatbot Response:
"To book a flight to Paris, you can:
1. Go to airline website
2. Enter departure city
3. Enter date
4. Select flight
5. Enter payment info
6. Confirm booking
Would you like me to explain any step?"
(Chatbot just explains, doesn’t act)
Agent Response:
Thought: User wants flight to Paris
Action: search_flights(to="Paris", when="?")
Observation: Need departure city and date
Thought: Missing critical info
Action: ask_user("When do you want to leave and from where?")
User: "From NYC next Friday"
Action: search_flights(from="NYC", to="Paris", date="2025-01-24")
Observation: [10 flights listed]
Thought: Show options and help choose
Action: show_options_and_ask("Which flight interests you?")
User: "The 7am United flight"
Action: book_flight(airline="United", time="7am", confirm=True)
Observation: Flight booked, confirmation #ABC123
Final: "Done! Your flight is booked. Confirmation: ABC123"
(Agent takes actual action)
Q11: How would you design an agent system from scratch?
Answer:
Step 1: Define Scope & Requirements
Questions:
- What problem does agent solve?
- What capabilities needed?
- What tools available?
- User expectations?
- Constraints (latency, cost, etc)?
Example:
Problem: Reduce time spent on expense reporting
Capabilities: Extract expenses, categorize, generate reports
Tools: Email, spreadsheet, document generator
User: Finance team, CFO
Constraints: Sub-second response, <$0.50/expense
Step 2: Design Architecture
Components:
┌─────────────────────────────────┐
│ User Interface │
├─────────────────────────────────┤
│ Agent Orchestrator │
├─────────────────────────────────┤
│ LLM (GPT-4, Claude, etc) │
├─────────────────────────────────┤
│ Tool Manager │
├─────────────────────────────────┤
│ Tools (search, calc, DB, etc) │
├─────────────────────────────────┤
│ Memory/State Manager │
├─────────────────────────────────┤
│ Monitoring & Logging │
└─────────────────────────────────┘
Step 3: Identify Tools
What can agent do?
For expense management:
1. Extract text from email
2. Classify expense category
3. Query expense limits
4. Store in database
5. Generate expense report
6. Notify user
Define tool specs:
tools = [
{
"name": "extract_expense",
"description": "Extract amount and category from text",
"input": {"text": "string"},
"output": {"amount": float, "category": string}
},
...
]
Step 4: Choose Architecture Pattern
Simple: ReAct
Thought → Action → Observation → Repeat
Complex: Hierarchical
Manager Agent
├─ Extraction Agent
├─ Classification Agent
└─ Reporting Agent
Step 5: Implement Core Loop
class Agent:
def __init__(self, llm, tools, memory):
self.llm = llm
self.tools = tools
self.memory = memory
def run(self, goal):
context = self.memory.get_context()
max_iterations = 10
for i in range(max_iterations):
# Think
thought = self.llm.generate(
goal=goal,
context=context,
tools=self.tools
)
# Decide action
if thought.should_answer:
return thought.final_answer
action = thought.action
tool_name = action.tool
arguments = action.arguments
# Execute
try:
observation = self.execute_tool(tool_name, arguments)
except Exception as e:
observation = f"Error: {e}"
# Update context
context.append({
"thought": thought.text,
"action": action,
"observation": observation
})
self.memory.add(context[-1])
return "Max iterations reached, couldn't complete"
def execute_tool(self, tool_name, arguments):
tool = self.tools[tool_name]
return tool(**arguments)
Step 6: Add Safety Guardrails
class SafetyGuardrails:
def __init__(self):
self.max_iterations = 10
self.max_tokens = 100000
self.allowed_tools = set()
self.max_cost = 10.0
def validate(self, action):
"""Check if action is safe"""
# Check tool exists and allowed
if action.tool not in self.allowed_tools:
raise ValueError(f"Tool {action.tool} not allowed")
# Check arguments
if action.arguments is None:
raise ValueError("Arguments required")
# Check cost
if self.estimated_cost > self.max_cost:
raise ValueError("Cost limit exceeded")
return True
Step 7: Add Memory
class MemoryManager:
def __init__(self):
self.short_term = [] # Current conversation
self.long_term = {} # Persistent storage
def add_interaction(self, thought, action, observation):
self.short_term.append({
"thought": thought,
"action": action,
"observation": observation,
"timestamp": time.time()
})
def summarize_and_store(self):
"""Move old interactions to long-term"""
if len(self.short_term) > 20:
summary = self.summarize(self.short_term[:10])
self.long_term[f"session_{time.time()}"] = summary
self.short_term = self.short_term[10:]
def get_context(self):
"""Get context for LLM"""
return self.short_term + self.retrieve_relevant()
Step 8: Testing
def test_agent():
agent = Agent(
llm=mock_llm,
tools=test_tools,
memory=MemoryManager()
)
# Test simple task
result = agent.run("Extract expense from 'Lunch at restaurant: $25'")
assert "25" in result
# Test complex task
result = agent.run("Process 3 expenses and generate report")
assert "report" in result.lower()
# Test error handling
result = agent.run("Call non-existent tool")
assert "error" in result.lower() or "invalid" in result.lower()
Step 9: Deployment
# API
app = FastAPI()
@app.post("/agent/run")
def run_agent(request: AgentRequest):
agent = load_agent()
result = agent.run(request.goal)
return {"result": result}
# Monitoring
@app.get("/metrics")
def metrics():
return {
"success_rate": calculate_success_rate(),
"avg_latency": calculate_latency(),
"error_rate": calculate_error_rate()
}
# Start server
uvicorn.run(app, host="0.0.0.0", port=8000)
Step 10: Iterate & Improve
Month 1: Basic functionality
Month 2: Add more tools, improve prompts
Month 3: Optimization (speed, cost)
Month 4: Advanced features (multi-agent)
Monitor metrics, gather feedback, improve continuously.
Q12: What’s the difference between agents and traditional APIs?
Answer:
Traditional API:
Client → API Request
API → Process → Response
Rigid interface:
- Fixed endpoints
- Defined parameters
- Predictable responses
Agents:
Client → Natural language goal
Agent → Reason → Tool calls → Result
Flexible interface:
- Understands intent
- Chooses actions
- Adaptive behavior
Comparison:
| Aspect | API | Agent |
|---|---|---|
| Interface | Fixed endpoints | Natural language |
| Intent | Must specify exactly | Understands from description |
| Flexibility | Rigid | Adaptive |
| Error handling | Returns error | Tries alternative approach |
| Learning curve | Must study docs | Intuitive |
| Complexity | Simple → Complex needs engineering | Complex → Elegant |
Example:
Weather API:
GET /api/weather?city=Paris&units=celsius
Response: {"temp": 8, "condition": "cloudy"}
User must know:
- Endpoint exists
- Parameter names
- Required format
- Response format
Weather Agent:
User: "Is it cold in Paris?"
Agent: "Paris is 8°C, which is cold for average"
Agent figured out:
- What data needed
- How to retrieve
- How to interpret
- How to answer naturally
When to use each:
Use APIs when:
- Need raw data
- Performance critical
- Well-defined interface
- Standard tool
- Cost-sensitive
Use Agents when:
- Need intelligent automation
- Handling complex workflows
- Natural interaction
- Handling variation
- Adaptability important
Hybrid Approach:
Combine both:
Frontend → Agent → API calls → Backend
User speaks to agent naturally
Agent calls APIs to get data
Provides intelligent response
Example:
User: "What's a good restaurant near me?"
Agent:
1. Extract location (user location)
2. Query restaurant API
3. Filter by ratings, distance
4. Sort by relevance
5. Present options naturally
Q13: How do you handle agent errors and failures?
Answer:
Error Types:
1. Tool Errors Tool doesn’t work correctly
Action: search("query")
Observation: Tool crashed / Returned bad data
Response: Log error, try alternative tool or retry
2. Decision Errors Agent chose wrong action
Thought: "I should search the web"
Action: search(...)
Observation: Gets irrelevant results
Correct action: Should use specific database API
3. Planning Errors Wrong plan for goal
Goal: "Book a flight"
Plan: Search → Verify → Book
Issue: Didn't check passport validity
Correct: Check passport → Search → Verify → Book
4. Hallucinations Agent makes up information
Agent: "The answer is X"
Reality: X is completely false
Issue: Agent didn't verify
Error Handling Strategies:
1. Retry Logic
def execute_tool_with_retry(tool, arguments, max_retries=3):
for attempt in range(max_retries):
try:
return tool(**arguments)
except ToolError as e:
if attempt < max_retries - 1:
wait(exponential_backoff(attempt))
continue
else:
raise
2. Fallback Mechanisms
primary_tool = search_web
fallback_tools = [search_database, search_cache]
try:
result = primary_tool(query)
except:
for tool in fallback_tools:
try:
result = tool(query)
break
except:
continue
3. Error Explanation
action = {"tool": "search", "query": "Paris"}
try:
observation = execute(action)
except Exception as e:
# Explain error to LLM
error_explanation = f"Tool failed: {str(e)}"
context.append(error_explanation)
# Let LLM decide next action
thought = llm.generate(context)
# LLM might: Retry with different params, use different tool, ask user
4. Human Escalation
def handle_complex_error(error, context):
if error.severity == "high":
# Can't handle automatically
return escalate_to_human(context)
else:
# Try to recover
return retry_with_modification(context)
5. Graceful Degradation
# Primary: Full solution
try:
full_result = get_complete_analysis()
return full_result
except:
# Secondary: Partial solution
try:
partial_result = get_basic_analysis()
return {partial_result, "note": "Limited analysis"}
except:
# Tertiary: Fallback
return "Unable to complete. Please try again."
6. Validation Layer
class Validator:
def validate_action(self, action):
"""Validate before execution"""
if not action.tool:
raise ValueError("Tool not specified")
if not self.tool_exists(action.tool):
raise ValueError("Tool doesn't exist")
if not self.validate_arguments(action):
raise ValueError("Invalid arguments")
return True
def validate_observation(self, observation):
"""Validate tool result"""
if observation is None:
return False, "Null result"
if not isinstance(observation, expected_type):
return False, "Wrong type"
return True, ""
7. Monitoring & Alerting
def monitor_execution(agent_state):
metrics = {
"tool_success_rate": calculate_success_rate(),
"error_rate": calculate_error_rate(),
"retry_count": count_retries(),
"human_escalations": count_escalations()
}
if metrics["error_rate"] > 0.1:
alert("High error rate detected")
if metrics["human_escalations"] > 50:
alert("Many escalations, investigate issues")
log_metrics(metrics)
8. Recovery Strategies
recovery_strategies = {
"timeout": {
"action": "retry_with_longer_timeout",
"max_attempts": 2
},
"rate_limit": {
"action": "exponential_backoff",
"max_attempts": 3
},
"authentication": {
"action": "refresh_credentials",
"then": "retry"
},
"invalid_input": {
"action": "ask_user_for_clarification",
"then": "retry"
}
}
Best Practices:
- Clear error messages: Help debugging
- Logging everything: Trace execution path
- Fail fast: Don’t keep trying if will fail
- User communication: Inform about issues
- Metrics tracking: Monitor error patterns
- Regular testing: Catch errors early
- Graceful degradation: Partial > nothing
Q14: How would you evaluate if an agentic AI solution is better than traditional software?
Answer:
Comparison Framework:
Cost Comparison:
Traditional:
Development: 1000 hours → $100K
Maintenance: 100 hours/year → $10K/year
Total 5-year: $150K
Agentic AI:
Setup: 100 hours → $10K
Maintenance: 20 hours/year → $2K/year
API costs: $50/month → $3K/year
Total 5-year: $35K
Conclusion: AI 4× cheaper
Performance Comparison:
Task: Process 1000 invoices
Traditional:
Manual: 40 hours (40 invoices/hour)
Cost: $2000 (at $50/hr)
Accuracy: 95%
AI Agent:
Automated: 2 minutes
Cost: $20 (compute + API)
Accuracy: 98%
Savings: $1980 + 40 hours
Quality Comparison:
| Metric | Traditional | AI Agent |
|---|---|---|
| Accuracy | 95% | 98% |
| Consistency | 90% (varies by person) | 99% |
| Handling exceptions | 70% (needs review) | 95% (auto-handles) |
| Speed | Slow | Very fast |
| Learning | Static | Improves over time |
User Experience:
Traditional:
- Long process
- Manual steps
- Wait times
- Back-and-forth
AI Agent:
- Quick
- Automated
- Instant feedback
- Natural interaction
Scalability:
Traditional:
Process 1000 items: 40 hours (linear cost)
Process 10000 items: 400 hours (not practical)
AI Agent:
Process 1000 items: 2 minutes
Process 10000 items: 20 minutes
Process 100000 items: 200 minutes
Linear scaling, can handle volume
Risk Analysis:
Traditional:
- Risk: Manual errors
- Mitigated by: Reviews, checks
- Cost of failure: Rework (10-20%)
AI Agent:
- Risk: Hallucinations, wrong decisions
- Mitigated by: Validation, escalation
- Cost of failure: Rework (5-10%)
Decision Framework:
Choose Traditional if:
- ✓ Existing robust system
- ✓ Very specific domain
- ✓ Rarely changes
- ✓ Simple workflow
- ✓ Regulatory restrictions
Choose AI Agent if:
- ✓ New problem/no existing solution
- ✓ Varied inputs (not fixed)
- ✓ Frequent changes
- ✓ Complex multi-step process
- ✓ Cost/speed critical
Hybrid Approach:
Combine both:
Traditional: Core critical business logic
AI Agent: Handle variations, exceptions
Result: Best of both worlds
Example:
Payment processing (traditional: critical)
Dispute handling (AI agent: variable)
Q15: What are emerging trends in agentic AI?
Answer:
1. Multi-Agent Orchestration
Trend: Complex systems with many specialized agents
Finance agents:
- Budget agent
- Investment agent
- Tax agent
- Reporting agent
Coordinated by master agent
Benefits:
- Specialization
- Scalability
- Fault isolation
- Parallel execution
2. Agent Memory & Learning
Trend: Agents that improve from experience
First task: New, slow, many errors
Tenth task: Same type, fast, few errors
Agent learned from experience
Stored patterns in memory
Reuses solutions
Implementation:
- Episodic memory (remembering specific events)
- Semantic memory (learned patterns)
- Continual learning from feedback
3. Agentic Reasoning Models
Trend: New models optimized for reasoning/planning
Examples:
- OpenAI o1 (thinking model, strong reasoning)
- Deepseek R1 (open-source reasoning)
- Anthropic Claude 4 (improved reasoning)
Benefits:
- Better planning
- Fewer errors
- More complex tasks
- More reliable
4. Real-Time Adaptation
Trend: Agents that adapt as situation changes
Original plan: A → B → C
New info: B not available
Adapt: A → D → C (alternative path)
Implementation:
- Monitor environment continuously
- Detect plan invalidity
- Generate alternatives
- Execute new plan
5. Embodied AI Agents
Trend: Agents controlling physical systems
Examples:
- Robots performing tasks
- Autonomous vehicles
- Manufacturing automation
- Warehouse robots
Challenges:
- Real-time constraints
- Physical safety
- Handling uncertainty
- Complex perception
6. Tool/API Standardization
Trend: Standard ways to define tools/APIs
OpenAI: Function calling
Anthropic: Tool use format
Hugging Face: Agent format
Moving toward: Standard format Benefit: Interoperability
7. Explainability in Agents
Trend: Understanding why agents decide
Methods:
- Tracing reasoning steps
- Attention visualization
- Attribution methods
- Counterfactual explanations
Importance:
- Trust (especially critical systems)
- Debugging
- Compliance/regulation
- User understanding
8. Constitutional AI for Agents
Trend: Agents with explicit principles
Principles:
1. Be helpful
2. Be harmless
3. Be honest
4. Respect privacy
5. Follow instructions
Agent: Self-reviews against principles
Self-corrects if violates
9. Federated Agents
Trend: Distributed agents coordinating
Healthcare agents:
- Hospital 1 agent
- Hospital 2 agent
- Pharma agent
Coordinate without sharing data
Maintain privacy
Achieve common goal
10. Agent Frameworks Maturing
Trend: Better tools and libraries
Examples:
- LangChain (most popular)
- AutoGPT
- CrewAI (multi-agent)
- LlamaIndex (RAG + agents)
- OpenAI Assistants API
Maturation:
- Better abstractions
- More examples
- Better documentation
- Community support
Future Outlook:
Near-term (2025-2026):
- Multi-agent systems mainstream
- Better reasoning models
- Improved tool integration
- Agent orchestration platforms
Medium-term (2026-2028):
- Embodied agents (robotics)
- Long-term learning agents
- Autonomous organizations
- Regulatory framework established
Long-term (2028+):
- AGI-level agents?
- Self-improving systems
- Agent economies
- New form of automation
Q16: What are AI agents in one sentence?
Answer:
AI agents are systems that use models plus tools, memory, and iterative control loops to pursue goals and take actions rather than only generate text.
Q17: What are tools in an AI agent?
Answer:
Tools are callable actions that let an agent interact with the world, such as searching the web, querying a database, running code, sending messages, or reading and writing files.
Q18: What is the agent loop?
Answer:
The agent loop is:
- perceive input
- reason and plan
- act through tools
- observe results
- reflect and continue or stop
Q19: What is the difference between episodic and semantic memory?
Answer:
Episodic memory stores prior experiences and interactions. Semantic memory stores durable facts such as user preferences, account data, or domain knowledge.
Q20: What is a RAG agent?
Answer:
A RAG agent combines retrieval with generation and usually also has tools. It retrieves relevant documents before answering, which improves grounding and reduces hallucinations.
Q21: What is a planner-executor architecture?
Answer:
A planner-executor architecture separates task decomposition from task execution. One component plans the steps, and another component carries them out, which improves modularity and control.
Q22: What is a DAG agent?
Answer:
A DAG agent organizes tasks as a directed acyclic graph where nodes are tasks and edges are dependencies. It is useful for parallelizable workflows and dependency-aware execution.
Q23: What is MCP and why does it matter for agents?
Answer:
MCP is the Model Context Protocol. It standardizes how models connect to external tools, resources, and services, making integrations more portable and structured.
Q24: Why do AI agent engineers need backend, Git, terminal, and API skills?
Answer:
Because real agents are software systems. They need API integration, state management, retries, auth, testing, deployment, version control, and operational debugging.
Q25: How should you talk about common model pricing in interviews?
Answer:
Focus on the pricing model rather than memorizing every number:
- providers charge per input and output token
- cached input is often cheaper
- larger reasoning models cost more than smaller standard models
- tools, retrieval, long context, and streaming all affect total spend
Then mention that exact prices change and should be checked on the provider’s official pricing page.
Summary & Key Takeaways
Core Concepts:
- Agents autonomously pursue goals
- Iterative loops: Think → Act → Observe → Reason
- Tools enable real-world action
- Memory and state crucial
- Multi-agent systems for complex problems
- Strong prerequisites matter: backend, APIs, Git, shell, LLM basics
- Retrieval, evaluation, observability, and security are production requirements
- Roadmap.sh tracks provide a strong interview learning sequence
Design Principles:
- Start simple, grow complex
- Clear tool definitions
- Robust error handling
- Human oversight where needed
- Continuous monitoring
Implementation Tips:
- Use frameworks (LangChain, CrewAI)
- Test thoroughly
- Plan for failure
- Monitor metrics
- Iterate based on feedback
- Understand pricing and context budgets early
- Add human approval for high-risk actions
When to Use:
- Complex workflows
- Variable inputs
- Automation opportunities
- Cost/speed critical
- Need adaptability
Good luck with your agentic AI interviews!
Sources
- AI Agents roadmap: https://roadmap.sh/ai-agents
- AI Engineer PDF roadmap: https://roadmap.sh/pdfs/roadmaps/ai-engineer.pdf
- MLOps PDF roadmap: https://roadmap.sh/pdfs/roadmaps/mlops.pdf
- Prompt Engineering PDF roadmap: https://roadmap.sh/pdfs/roadmaps/prompt-engineering.pdf
- Backend PDF roadmap: https://roadmap.sh/pdfs/roadmaps/backend.pdf
- Git and GitHub PDF roadmap: https://roadmap.sh/pdfs/roadmaps/git-github.pdf
- API Design PDF roadmap: https://roadmap.sh/pdfs/roadmaps/api-design.pdf
- roadmap.sh roadmaps hub: https://roadmap.sh/roadmaps
- OpenAI API pricing: https://openai.com/api/pricing/
- Anthropic pricing: https://docs.anthropic.com/en/docs/about-claude/pricing
- Gemini API pricing: https://ai.google.dev/gemini-api/docs/pricing?hl=en