Strategies for comparing routinely collected outcome data across services or systems include focusing on a common indicator (e.g., symptom change) or aggregating results from different measures or outcomes into a comparable core metric. The implications of either approach for judging treatment success are not fully understood. This study drew on naturalistic outcome data from 1641 adolescents with moderate or severe anxiety and/or depression symptoms who received routine specialist care across 60 mental health services in England. The study compared rates of meaningful improvement between the domains of internalizing symptoms, functioning, and progress towards self-defined goals. Consistent cross-domain improvement was observed in only 15.6% of cases. Close to one in four (24.0%) young people with reliably improved symptoms reported no reliable improvement in functioning. Inversely, one in three (34.8%) young people reported meaningful goal progress but no reliable symptom improvement. Monitoring systems that focus exclusively on symptom change risk over- or under-estimating actual impact, while aggregating different outcomes into a single metric can mask informative differences in the number and type of outcomes showing improvement. A move towards harmonized outcome measurement approaches across multiple domains is needed to ensure fair and meaningful comparisons.