May 9, 2026

Verify the proof, not just the answer.

A model produces a confident-looking calculus solution with the wrong final answer. Your job is to find where the reasoning broke. Often the answer is right but the working is fabricated. Both matter.

The example

A model produces a confident-looking solution to a calculus problem. The presentation is clean. The final answer is wrong.

Your job is to find where the reasoning broke. The break is rarely obvious — often it's an algebraic slip three steps back, or a dropped sign in a substitution.

Why this is the work

Mathematics review is harder than it looks because models often get the answer right while the working is fabricated. Both matter. A correct answer with bad working is still a failed output — because it can't be trusted at scale.

What qualifying looks like

Pass the Mathematics Cognition Test (60 questions, 78% pass mark) and you're qualified for AI subcontractor work at the STEM tier — up to $40/hr.

Jobpeak Editorial

Platform team

A model produces a confident-looking solution to a calculus problem with the wrong final answer. Your job is to find where the reasoning broke. Often the model gets the answer right but the working is fabricated. Both matter.

Newsletter

Subscribe for cutting-edge AI updates

Lorem ipsum dolor sit amet consectetur at amet felis nulla molestie non viverra diam sed augue gravida ante risus pulvinar diam turpis ut bibendum ut velit felis at nisl lectus.

Thanks for subscribing to our newsletter!

Oops! Something went wrong while submitting the form.

Only one email per month — No spam!

Verify the proof, not just the answer.

The example

Why this is the work

What qualifying looks like

Jobpeak Editorial

Subscribe for cutting-edge AI updates

Related articles

Spot the bug a model just shipped.

Verify the proof, not just the answer.

Catch the plausible-but-wrong claim.