TutorialsJune 7, 20259 min read

Prompt Engineering for Developers: 10 Techniques That Actually Work in 2025

The prompt engineering techniques that produce consistent, measurable improvements — chain of thought, few-shot, role prompting, self-consistency, and more. With real code examples.

Why Prompt Engineering Still Matters in 2025

Modern LLMs are powerful enough that bad prompts sometimes produce good results. This fools developers into thinking prompt quality does not matter. It does. The gap between a naive prompt and a well-engineered one is often the difference between a feature that works 60% of the time and one that works 95% of the time.

This guide covers the ten techniques that produce real, measurable improvements. Every example is runnable against a FreeLLMKeys API key.

Setup

from openai import OpenAI

client = OpenAI(
    base_url="https://aiapiv2.pekpik.com/v1",
    api_key="sk-your-freellmkeys-key"
)

def prompt(system: str, user: str, model: str = "gpt-4o") -> str:
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content": user}
        ]
    )
    return response.choices[0].message.content

1. Chain of Thought (CoT) Prompting

Add "Think step by step" or "Let's think through this carefully" to reasoning-heavy tasks. This forces the model to externalize its reasoning, which dramatically improves accuracy on math, logic, and multi-step problems.

# Without CoT
bad = prompt("You are a helpful assistant.",
             "A store buys items for $40 and sells them at 25% markup. If they sold 15 items and had $120 in expenses, what is the profit?")

# With CoT
good = prompt("You are a helpful assistant.",
              "A store buys items for $40 and sells them at 25% markup. If they sold 15 items and had $120 in expenses, what is the profit? Think step by step.")

# CoT typically produces correct answers; naive prompts often make arithmetic errors

2. Few-Shot Examples

Show the model exactly what you want by providing 2–3 input/output examples before your actual question. This is the most reliable way to control output format.

few_shot_system = """You extract structured data from text. Output JSON only.

Examples:
Text: "John Smith called from +1-555-0123 about invoice #4521"
Output: {"name": "John Smith", "phone": "+1-555-0123", "reference": "invoice #4521"}

Text: "Meeting with Sarah at 3pm tomorrow re: project Alpha"
Output: {"name": "Sarah", "time": "3pm tomorrow", "topic": "project Alpha"}"""

result = prompt(few_shot_system,
                "Got a message from Mike Torres, his number is 555-9876, asking about order ORD-8821")
# Output will reliably follow the JSON format

3. Role Prompting

Assigning an expert role consistently improves output quality for domain-specific tasks. The model activates different "modes" based on the persona assigned.

# Generic
generic = prompt("You are a helpful assistant.",
                 "Review this Python function for issues.")

# Role-specific
expert = prompt(
    "You are a senior Python engineer with 10 years of experience in production systems. You are thorough, direct, and focus on security and scalability issues first.",
    "Review this Python function for issues."
)
# Expert role produces more actionable, production-focused feedback

4. Output Format Specification

Explicitly specifying the output format eliminates parsing work and makes outputs reliable enough to use programmatically.

structured_prompt = """Analyze the sentiment of the given text.
Respond ONLY with a JSON object in this exact format:
{
  "sentiment": "positive" | "negative" | "neutral",
  "confidence": 0.0 to 1.0,
  "key_phrases": ["phrase1", "phrase2"],
  "reasoning": "one sentence explanation"
}"""

result = prompt(structured_prompt, "The product worked as expected but the delivery was incredibly slow.")
import json
data = json.loads(result)  # Safe to parse — format is enforced

5. Decomposition — Break Hard Problems Into Steps

def solve_complex_task(task: str) -> dict:
    # Step 1: Plan
    plan = prompt("You are a senior engineer.", f"Break this task into 3-5 concrete subtasks: {task}")

    # Step 2: Execute each subtask
    results = []
    for subtask in plan.split('\n'):
        if subtask.strip():
            result = prompt("You are a senior engineer.", f"Complete this specific subtask: {subtask}")
            results.append(result)

    # Step 3: Synthesize
    synthesis = prompt(
        "You are a senior engineer.",
        f"Combine these results into a final solution:\n" + "\n".join(results)
    )
    return {"plan": plan, "steps": results, "final": synthesis}

6. Self-Consistency (Run Multiple Times, Take Majority)

For high-stakes decisions, run the same prompt 3–5 times and take the most common answer. This reliably increases accuracy on reasoning tasks.

from collections import Counter

def self_consistent(question: str, runs: int = 3) -> str:
    answers = []
    for _ in range(runs):
        response = prompt(
            "Answer concisely and directly.",
            f"{question} Think step by step, then give your final answer on the last line starting with 'Answer:'",
        )
        # Extract the final answer
        for line in response.split('\n'):
            if line.startswith('Answer:'):
                answers.append(line.replace('Answer:', '').strip())
    return Counter(answers).most_common(1)[0][0]

7. Negative Prompting — Tell It What NOT to Do

# Without negative prompting — often gets verbose, adds disclaimers
without = prompt("You are helpful.", "Explain recursion.")

# With negative prompting — cleaner output
with_neg = prompt(
    "You are a concise technical writer. Do NOT add disclaimers. Do NOT say 'Great question'. Do NOT use filler phrases. Be direct.",
    "Explain recursion in 3 sentences."
)

8. Temperature Control for Consistency vs Creativity

# For deterministic tasks (code, extraction, classification) — low temperature
deterministic = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Extract all email addresses from: 'Contact john@example.com or jane@test.org'"}],
    temperature=0.1  # Near-zero for reliable extraction
)

# For creative tasks (naming, brainstorming, writing) — higher temperature
creative = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Suggest 10 creative names for a developer tool that helps with API testing"}],
    temperature=0.9  # Higher for variety
)

9. Context Stuffing — Give the Model What It Needs

LLMs do not have access to your codebase, your docs, or your requirements. Paste the relevant context directly into the prompt. More context = better answers.

def context_aware_review(code: str, requirements: str, existing_tests: str) -> str:
    return prompt(
        "You are a senior developer reviewing code for production readiness.",
        f"""REQUIREMENTS:
{requirements}

EXISTING TESTS:
{existing_tests}

CODE TO REVIEW:
{code}

Does this implementation satisfy all requirements? What is missing?"""
    )

10. Iterative Refinement

def iterative_improve(initial_output: str, feedback: str) -> str:
    return prompt(
        "You improve outputs based on specific feedback.",
        f"""Original output:
{initial_output}

Feedback:
{feedback}

Produce an improved version that addresses all the feedback."""
    )

Which Techniques Give the Most Bang for Your Buck?

Ranked by impact-to-effort ratio:

Output format specification — instant reliability improvement
Chain of thought — huge impact on reasoning tasks
Few-shot examples — most reliable format/style control
Role prompting — easy to add, meaningful quality bump
Negative prompting — eliminates most common annoyances

Start with these five. Test each one on your specific task using a FreeLLMKeys key — the cost is $0, so there is no reason not to experiment extensively.

FreeLLMKeys Team

Building tools for the AI developer community

PreviousUsing LangChain with a Free LLM API Key — Full Setup Guide NextThe 10 Best Free LLM APIs in 2025 (With Rate Limits & Model Quality Compared)