A model produces a confident-looking calculus solution with the wrong final answer. Your job is to find where the reasoning broke. Often the answer is right but the working is fabricated. Both matter.

A model produces a confident-looking solution to a calculus problem. The presentation is clean. The final answer is wrong.
Your job is to find where the reasoning broke. The break is rarely obvious — often it's an algebraic slip three steps back, or a dropped sign in a substitution.
Mathematics review is harder than it looks because models often get the answer right while the working is fabricated. Both matter. A correct answer with bad working is still a failed output — because it can't be trusted at scale.
Pass the Mathematics Cognition Test (60 questions, 78% pass mark) and you're qualified for AI subcontractor work at the STEM tier — up to $40/hr.
A model produces a confident-looking solution to a calculus problem with the wrong final answer. Your job is to find where the reasoning broke. Often the model gets the answer right but the working is fabricated. Both matter.
Lorem ipsum dolor sit amet consectetur at amet felis nulla molestie non viverra diam sed augue gravida ante risus pulvinar diam turpis ut bibendum ut velit felis at nisl lectus.