— Present
AI Evaluation, Data Quality & Software Engineering
- QA and rubric-based auditing of other contributors' datasets for function-calling and agentic-AI projects, verifying correctness, format compliance, and consistency before delivery.
- Deployment and configuration of local model environments to run frontier models against real tasks and generate datasets, including HFI problem sets.
- Forking and internal extension of open-source tooling — JSON support in Cerberus; multi-layer validation and error detection in Haystack — delivered as part of the dataset.
- RLHF evaluation, pairwise comparisons, and multi-turn prompt design with rubric-based scoring for correctness, reasoning, and instruction-following.