Projects heading icon

Projects

Evaluation Scenario Writer (m/w/d)

100%
290 - 640€/day
Information Technology

AI Evaluation Consultant (m/w/d)

from 95%
440 - 480€/day
Information Technology

Freelance Electrical Engineer with Python Experience (m/w/d)

from 95%
120 - 380€/day
Information Technology

Freelance Automotive Engineer (with Python) - Quality Assurance / AI Trainer

from 95%
120 - 380€/day
Information Technology

Freelance Mechanical Engineer with Python Experience (m/w/d)

from 95%
120 - 380€/day
Information Technology

AI Consultant - Machine Learning (m/w/d)

100%
220 - 360€/day
Information Technology

Vibe Coding Web Scraping Expert (m/f/d)

100%
200 - 240€/day
Information Technology

AI Consultants - Data Science (m/w/d)

Munich, Germany
from 95%
320 - 400€/day
Information Technology
Professional Services

Area Product Manager (m/f/d)

Munich, Germany
up to 80%
750 - 810€/day
Retail
Telecommunication

Senior Project Manager Customer Interaction

Munich, Germany
100%
750 - 800€/day
Information Technology
Professional Services

Development of TM1 Planning Analytics and Interfaces (m/w/d)

Germany
up to 100%
Information Technology

Data Engineer (m/f/d)

Munich, Germany
from 95%
800€/day
Information Technology

Freelance Product Owner for Point Of Sale App

Berlin, Germany
750 - 850€/day
Banking and Finance
Information Technology
Retail

Adobe Experience Cloud Consultant (m/f/d)

Munich, Germany
from 95%
700 - 750€/day
Telecommunication

ERP Transformation Manager (m/f/d)

Eisenach, Germany
40 - 70%
Construction

Senior Cloud Developer TypeScript (m/f/d)

100%
900 - 1100€/day
Information Technology

Expert in Process Automation for Law Firm Environments (m/f/d)

Germany
from 95%
Professional Services

Commissioning & Qualification (C&Q) Engineer (m/f/d)

Munich, Germany
up to 100%
Pharmaceutical

Java IT Architect (m/f/d)

Germany
up to 100%
Banking and Finance

Freelance E-Engineer (m/f/d)

Germany
40 - 60%
Manufacturing

Social Compliance Auditor (m/f/d)

100%
Professional Services

Project Manager (Project Control Focus) (m/f/d)

Germany
up to 90%
Government and Administration

Management Consultant (Senior Level) (m/f/d)

Munich, Germany
up to 100%
900 - 950€/day
Professional Services

Cyber Security Consultant – Product Security & Regulatory Compliance (m/f/d)

Germany
up to 100%
Healthcare

Interim Accounting Lead / Head Of (m/f/d)

Germany
up to 100%

Financial Accountant (m/f/d)

Hamburg, Germany
up to 80%
Cosmetics

Construction Manager according to LBO - Civil and MEP (m/f/d)

Berlin, Germany
800€/day
Construction
Energy
Utilities

Auditor – FSC® and PEFC Chain of Custody (m/f/d)

100%
Manufacturing
Professional Services

ISO 20121 Auditor (w/m/d)

100%
Professional Services

Interim Staff Product Manager (m/w/d)

Berlin, Germany
60 - 80%
100€/day
Information Technology

Safety and Health Protection Coordinator (SiGeKo) and Safety Specialist (SiFa) (m/f/d)

Hamburg, Germany
0%
Construction

Sales Manager für ein Medienunternehmen (m/f/d)

Hamburg, Germany
from 80%
750 - 830€/day
Information Technology
Professional Services

Senior IT Projektmanager (m/w/d) für ein Energieunternehmen

Munich, Germany
from 80%
750 - 830€/day
Energy
Information Technology

to get access to more exciting projects that match your skills and preferences!

More projects background

Evaluation Scenario Writer (m/w/d)

Show number of applicants
Rate
Daily rate 290 - 640€
Remote work
Remote 100%
Languages
Languages
English (Advanced)
Industries
Information Technology
Business areas
Quality Assurance
Description

We’re looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You’ll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You’ll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions.

Although every project is unique, you might typically:

  • Designing structured test scenarios based on real-world tasks.
  • Defining the golden path and acceptable agent behavior.
  • Annotating task steps, expected outputs, and edge cases.
  • Working with devs to test your scenarios and improve clarity.
  • Reviewing agent outputs and adapting tests accordingly
Requirements
  • Bachelor's and/or Master’s Degreein Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields.
  • Background in QA, software testing, data analysis, or NLP annotation.
  • Good understanding of test design principles (e.g., reproducibility, coverage, edge cases).
  • Strong written communication skills in English.
  • Comfortable with structured formats like JSON/YAML for scenario description.
  • Can define expected agent behaviors (gold paths) and scoring logic.
  • Basic experience with Python and JS.
  • Curious and open to working with AI-generated content, agent logs, and prompt-based behavior.
  • You are ready to learn new methods, able to switch between tasks and topics quickly and sometimes work with challenging, complex guidelines.
  • Our freelance role is fully remote so, you just need a laptop, internet connection, time available and enthusiasm to take on a challenge.

Nice to Have

  • Experience in writing manual or automated test cases.
  • Familiarity with LLM capabilities and typical failure modes.
  • Understanding of scoring metrics (precision, recall, coverage, reward functions).

Frequently asked questions

The project is fully remote, providing complete location flexibility.
The project is 100% remote. You can work from any location.
The project offers a daily rate of 290 - 640€ which breaks down to an hourly rate of 36 - 80€/h.
The project requires the following languages: English (Advanced).
The project is related to the following industry: Information Technology.
The project covers the following business area: Quality Assurance.
Yes! Recommend a freelancer for the project and earn 30% of FRATCH's profits every time they get placed — for the duration of that project. Simply share your invite link with a colleague to get started.
To apply for the project, click the Apply button on the project page to submit your profile for review. We will forward your resume to the client and get back to you within a few days.

Similar Projects

AI Evaluation Consultant (m/w/d)

from 95%
440 - 480€/day

Freelance Electrical Engineer with Python Experience (m/w/d)

from 95%
120 - 380€/day

Freelance Automotive Engineer (with Python) - Quality Assurance / AI Trainer

from 95%
120 - 380€/day

Freelance Mechanical Engineer with Python Experience (m/w/d)

from 95%
120 - 380€/day

AI Consultant - Machine Learning (m/w/d)

100%
220 - 360€/day

Vibe Coding Web Scraping Expert (m/f/d)

100%
200 - 240€/day

AI Consultants - Data Science (m/w/d)

Munich, Germany
from 95%
320 - 400€/day

Area Product Manager (m/f/d)

Munich, Germany
up to 80%
750 - 810€/day

Senior Project Manager Customer Interaction

Munich, Germany
100%
750 - 800€/day

Development of TM1 Planning Analytics and Interfaces (m/w/d)

Germany
up to 100%

Data Engineer (m/f/d)

Munich, Germany
from 95%
800€/day

Freelance Product Owner for Point Of Sale App

Berlin, Germany
750 - 850€/day

Adobe Experience Cloud Consultant (m/f/d)

Munich, Germany
from 95%
700 - 750€/day

ERP Transformation Manager (m/f/d)

Eisenach, Germany
40 - 70%

Senior Cloud Developer TypeScript (m/f/d)

100%
900 - 1100€/day

Expert in Process Automation for Law Firm Environments (m/f/d)

Germany
from 95%

Commissioning & Qualification (C&Q) Engineer (m/f/d)

Munich, Germany
up to 100%

Java IT Architect (m/f/d)

Germany
up to 100%

Freelance E-Engineer (m/f/d)

Germany
40 - 60%

Social Compliance Auditor (m/f/d)

100%

Project Manager (Project Control Focus) (m/f/d)

Germany
up to 90%

Management Consultant (Senior Level) (m/f/d)

Munich, Germany
up to 100%
900 - 950€/day

Cyber Security Consultant – Product Security & Regulatory Compliance (m/f/d)

Germany
up to 100%

Interim Accounting Lead / Head Of (m/f/d)

Germany
up to 100%

Financial Accountant (m/f/d)

Hamburg, Germany
up to 80%

Construction Manager according to LBO - Civil and MEP (m/f/d)

Berlin, Germany
800€/day

Auditor – FSC® and PEFC Chain of Custody (m/f/d)

100%

ISO 20121 Auditor (w/m/d)

100%

Interim Staff Product Manager (m/w/d)

Berlin, Germany
60 - 80%
100€/day

Safety and Health Protection Coordinator (SiGeKo) and Safety Specialist (SiFa) (m/f/d)

Hamburg, Germany
0%