What Vera is
Vera is magicpin's AI assistant for merchant growth. It helps merchants improve listings, run campaigns, and reply faster.
No Resume | No Interview | No Experience Required
Win the AI Challenge & join the core team to build India’s Largest Retailer AI, VERA
The task
Vera is magicpin's AI assistant for merchant growth. It helps merchants improve listings, run campaigns, and reply faster.
Build a deterministic compose(category, merchant, trigger, customer?) function.
It should return the next message, CTA, send-as identity, suppression key, and rationale.
Use merchant context to decide what Vera should say next. Make every message specific, useful, and easy to reply to.
The right tone, offer patterns, seasonal moments, and what to avoid.
Business identity, performance signals, live offers, and conversation history.
Why this message should go now: recall, spike, dip, research, or festival.
Optional context for direct outreach: relationship, consent, status, and preference.
How scoring works
Each dimension is scored from 0–10 (10 = highest score).
Each scored out of 10
Can your bot pick the best signal for this moment? Great outputs combine trigger + merchant state + category fit before writing.
Use real numbers, offers, dates, and local facts from the given input.
Keep tone true to the business type: clinical, visual, timely, or utility-first.
Personalize to merchant metrics, offer catalog, and prior conversation behavior.
Give one strong reason to reply now with a low-effort next action.
This package includes the LLM-powered judge as judge_simulator.py.
/healthz, /metadata, /context,
/tick, /reply
judge_simulator.py and set LLM_PROVIDER,
LLM_API_KEY, and BOT_URL
Run judge simulator (after setting LLM API key)
python judge_simulator.py
Your output should stay deterministic for the same input and simulator settings.
Strong bots do not repeat every available fact. They choose the one signal that should drive the next message.
Engagement means the merchant is likely to reply, not just read. Keep asks short and low-friction, with one clear next step.
Bold does not mean hype. It means a sharp hook from real context, without invented claims.
Generic copy gets penalized. Grounded copy with real merchant facts scores better.
Message craft
Use proof, urgency, curiosity, and one simple yes/no action.
Respect the session rules: one clear CTA per send, no fake claims.
After submission, judges inject new digest items, metric shifts, triggers, and customer contexts.
Dataset
$ tree magicpin-ai-challenge/
magicpin-ai-challenge/
├── challenge-brief.md
├── challenge-testing-brief.md
├── engagement-design.md
├── engagement-research.md
├── dataset/
│ ├── categories/ # 5 verticals: dentists, salons, restaurants, gyms, pharmacies
│ ├── merchants_seed.json # 10 seeds → expanded to 50 merchants
│ ├── customers_seed.json # 15 seeds → expanded to 200 customers
│ ├── triggers_seed.json # 25 seeds → expanded to 100 triggers
│ └── generate_dataset.py # deterministic expansion + 30 canonical test pairs
└── examples/
├── api-call-examples.md
└── case-studies.md # 10 judge-scored anchors
$ python3 dataset/generate_dataset.py --seed-dir dataset --out expanded
# → expanded/ · 50 merchants · 200 customers · 100 triggers · 30 test pairs
Submit a public bot URL. A one-page
README.md can explain approach, model choice, and tradeoffs.
Testing setup
magicpin's harness calls your URL, sends context, simulates replies, and scores each output. Your bot should stay stateful, fast, and grounded in received context.
# Push merchant context (idempotent by scope + version)
$ curl -sS https://your-bot.example/v1/context \
-H "Content-Type: application/json" \
-d '{
"scope": "merchant",
"context_id": "m_001_drmeera",
"version": 3,
"payload": { "identity": {...}, "performance": {...}, "offers": [...] },
"delivered_at": "2026-04-29T10:00:00Z"
}'
# Response — 200 OK
{ "accepted": true, "ack_id": "ack_abc123", "stored_at": "2026-04-29T10:00:00.123Z" }
# Re-posting the same version is a no-op. Higher version replaces atomically.
# Periodic wake-up — your bot decides what to send
$ curl -sS https://your-bot.example/v1/tick \
-d '{ "now": "2026-04-29T10:30:00Z",
"available_triggers": ["trg_research_digest_dentists"] }'
# Response — ≤ 20 actions/tick
{
"actions": [{
"merchant_id": "m_001_drmeera",
"trigger_id": "trg_research_digest_dentists",
"body": "Dr. Meera, your CTR is 2.1% vs 3.0% South Delhi peer median. You already have Dental Cleaning @ ₹299. Want me to draft a 160-char patient message around it?",
"cta": "open_ended",
"suppression_key": "research:dentists:2026-W17"
}]
}
# Merchant or customer replied — bot returns send / wait / end within 30s
$ curl -sS https://your-bot.example/v1/reply \
-d '{ "conversation_id": "conv_001",
"from_role": "merchant",
"message": "Yes, send me the abstract",
"turn_number": 2 }'
# Three valid actions: send, wait, end
{ "action": "send",
"body": "Sending now — also drafted a 90-sec patient-ed WhatsApp...",
"rationale": "Honoring accept; adding next-best-step low-friction" }
# Liveness probe — three consecutive failures disqualify the run
$ curl -sS https://your-bot.example/v1/healthz
{ "status": "ok",
"uptime_seconds": 3600,
"contexts_loaded": { "category": 5, "merchant": 50, "customer": 200, "trigger": 100 } }
# Team identity for the leaderboard
$ curl -sS https://your-bot.example/v1/metadata
{ "team_name": "Team Alpha",
"team_members": ["Alice", "Bob"],
"model": "claude-opus-4-7",
"approach": "single-prompt composer with retrieval",
"version": "1.2.0" }
Health and metadata checks, then base context load: categories, merchants, customers.
60 simulated minutes. Every 5 minutes, the judge pushes updates and calls /v1/tick.
Fresh digest items, metric shifts, new triggers, and surprise customer scopes arrive mid-test.
Top 10 bots face auto-replies, intent transitions, and hostile/off-topic scenarios.
Teams receive message scores, logs, transcripts, timeline, and judge rationale.
Host the bot on any cloud provider. The submitted public URL must expose all required endpoints.
Before submission, run endpoint checks locally using the Judge Simulator and the sample API
calls in
examples/api-call-examples.md to validate payload handling, response
shape,
and timeout behavior.
Starter package
The zip includes the brief, test contract, dataset generator, API examples, and scored case studies. It matches the package used by the judge harness.
Download challenge zipCategory contexts plus seed files for merchants, customers, and triggers.
generate_dataset.py expands seeds to 50 merchants, 200 customers,
100 triggers, and 30 canonical test pairs.
Request/response examples and 10 case studies with inputs, outputs, and scorecards.
All teams use the same seed and get the same expanded base dataset.
python3 dataset/generate_dataset.py --seed-dir dataset --out expanded
magicpin-ai-challenge/
engagement-design.md
engagement-research.md
challenge-brief.md
challenge-testing-brief.md
dataset/
categories/
dentists.json
salons.json
restaurants.json
gyms.json
pharmacies.json
merchants_seed.json
customers_seed.json
triggers_seed.json
generate_dataset.py
examples/
api-call-examples.md
case-studies.md
Implement deterministic message composition from structured JSON context.
Submit a hosted URL where the judge can call the required API endpoints.
Judges inject fresh facts after submission to test adaptability and grounding.
Top bots are replay-tested on replies, objections, auto-replies, and intent handoffs.
Apply now
No long process. Build something real, show your thinking, and submit.
Get the starter zip and review the context format.
Start with one clear end-to-end flow.
Submit one public base URL (example: https://mybot.example.com). Judge will call
POST /v1/context, POST /v1/tick, POST /v1/reply,
GET /v1/healthz, and GET /v1/metadata.
Share your details and keep your bot live for evaluation.
Frequently Asked Questions (FAQ)
This challenge is a direct quality filter: sharp reasoning, strong product judgment, and reliable execution.
The local judge_simulator gives you a deterministic dry-run on the
30 canonical test pairs. The actual judge harness uses the same scoring logic
but injects new facts you haven't seen — fresh digest items, performance
shifts, surprise customer scopes, replies you can't predict. Your score
depends on how your bot handles those, not on how it does on the 30 pairs.
Bots that pattern-match the simulator will fail. Bots that ground every output
in the context they've actually been given will not.
Signal quality. If your decisions are grounded, deterministic, and useful for merchants, we notice.
A full-time offer — or an internship that converts into a full-time offer. Top candidates join the team; strong performance turns it into a permanent role.
Building the bot is easy. Building one a merchant actually wants to engage with is the hard part — that's the filter.
02 May 2026, 11:59 PM IST — submission closes. Keep your bot live and reachable for the next ~3 days while the judge harness runs scoring on fresh scenarios. Selected candidates hear from us by 5 May 2026.
Same scoring logic, different inputs. The judge_simulator runs
locally and evaluates against the 30 canonical test pairs you can see. The
judge harness — the real evaluation — uses the same scoring
code but injects fresh scenarios you haven't seen. The simulator is for
development confidence. The harness is for the score.
Solo applications only. Students and working professionals are both welcome. The full-time role is based out of our Gurgaon office. Before the final offer, we'll verify that you actually did the work yourself.
No hard cap. Write the message length that best fits the context. Include links only when they add real value for the merchant.
No. We judge the output quality and reasoning quality, not years of experience.
Yes, but each submission is judged on quality. More submissions do not help if the core logic is weak.
Hallucinated facts, generic templates, unstable responses, and broken endpoint behavior.
Specific merchant-aware messaging, sharp decision logic, and consistent behavior across judge scenarios.
Important date
Final submission cut-off for the AI challenge.
Update: Deadline extended. Final cut-off is Sunday, 3 May 2026, 11:00 PM IST.
11:00 PM IST