Table of contents
Official Content
  • This documentation is valid for:

This article presents you a selection of AI models recommended for performing different types of tasks. These are intended to help you to identify which models are best suited for particular workloads.

Topic Benchmark Description Rank Model Provider in Globant Enterprise AI
Coding SWE-Bench Verified Evaluates LLMs on real-world software engineering tasks:
- Bug fixing.
- Code generation.
- Multi-file edits on GitHub repositories.
Score = % of issues resolved autonomously.
1 claude-sonnet-4-5-20250929 Anthropic, AWS Bedrock, Vertex AI
2 claude-opus-4-5-20251101 Anthropic, AWS Bedrock, Vertex AI
3 gpt-5.2-2025-12-11 OpenAI, Azure AI Foundry
4 gpt-5.1 OpenAI
5 gemini-3-pro-preview Google Vertex AI
Agentic Agentic Benchmarks Measures autonomous multi-step task execution including:
- Tool use.
- Planning.
- Long-horizon reasoning.
- Sequential decision-making across complex workflows.
1 grok-4-1-fast-reasoning xAI
2 claude-opus-4-6 Anthropic, AWS Bedrock, Vertex AI
3 claude-sonnet-4-6 Anthropic, AWS Bedrock, Vertex AI
4 gemini-3.1-pro-preview Google Vertex AI
5 moonshotai-kimi-k2-thinking OpenRouter
Multilingual MMMLU Massive Multitask Multilingual Language Understanding benchmark that tests knowledge and reasoning across 57 subjects in multiple languages.
Higher = better multilingual comprehension.
1 gemini-3-pro-preview Google Vertex AI
2 claude-opus-4-5-20251101 Anthropic, AWS Bedrock, Vertex AI
3 claude-opus-4-1-20250805 Anthropic, AWS Bedrock, Vertex AI
4 gemini-2.5-pro Google Vertex AI
5 claude-sonnet-4-5-20250929 Anthropic, AWS Bedrock, Vertex AI
Reasoning GPQA Diamond Graduate-Level Google-Proof Q&A. Expert-level questions in:
- Biology.
- Chemistry.
- Physics designed to test deep scientific reasoning.
Score = % correct.
1 gpt-5.2-2025-12-11 OpenAI, Azure AI Foundry
2 gemini-3-pro-preview Google Vertex AI
3 gpt-5.1 OpenAI
4 grok-4 xAI
5 claude-opus-4-5-20251101 Anthropic, AWS Bedrock, Vertex AI
Math AIME 2025 A set of challenging high-school mathematics competition problems that require:
- Multi-step algebraic.
- Logical reasoning.
1 gemini-3-pro-preview Google Vertex AI
2 gpt-5.2-2025-12-11 OpenAI, Azure AI Foundry
3 moonshotai-kimi-k2-thinking OpenRouter
4 o3 OpenAI, Azure
5 openai-gpt-oss-20b-maas Google Vertex AI
Visual Reasoning ARC-AGI 2 Abstraction and Reasoning Corpus for AGI.
Tests visual pattern recognition and abstract reasoning on novel tasks never seen during training.
1 claude-opus-4-5-20251101 Anthropic, AWS Bedrock, Vertex AI
2gpt-5.2-2025-12-11OpenAI, Azure AI Foundry
3gemini-3-pro-previewGoogle Vertex AI
4gpt-5.1OpenAI
5gpt-5OpenAI, Azure
Best Overall Humanity's Last Exam A collection of the hardest questions across all academic disciplines, designed to be unsolvable by current AI.
Tests overall frontier intelligence.
1 gemini-3-pro-preview Google Vertex AI
2moonshotai-kimi-k2-thinkingOpenRouter
3gpt-5OpenAI, Azure
4grok-4xAI
5gemini-2.5-proGoogle Vertex AI
Fastest Speed (tokens/sec) Measures inference throughput in tokens per second.
Higher = faster response generation.
Critical for real-time and high-volume applications.
1 llama-4-scout-17b-16e-instruct Cerebras
2llama-3.3-70bCerebras
3llama3.1-8bCerebras
4openai-gpt-oss-20bGroq, AWS Bedrock, Vertex AI
5gemini-2.0-flashGoogle Vertex AI
Largest Context Context Window Maximum number of tokens a model can process in a single prompt+response.
Larger = better for long documents, codebases, and multi-turn agents.
1 grok-4-fast-non-reasoning xAI
2grok-4-fast-reasoningxAI
3grok-4-1-fast-non-reasoningxAI
4qwen3-coderOpenRouter
5gemini-3-pro-previewGoogle Vertex AI
Cheapest Cost (per 1M tokens) Input + Output pricing per million tokens.
Lower = more economical for large-scale deployments.
1 moonshotai/kimi-k2:free OpenRouter
2openai-gpt-oss-20bGroq, AWS Bedrock, Vertex AI
3openai-gpt-oss-120bGroq, AWS Bedrock, Vertex AI
4gemini-2.0-flash-liteGoogle Vertex AI
5moonshotai-kimi-k2-thinkingOpenRouter
Note: As of Version 2026-02, you can ask Iris about recommended models by topic when you are creating a new Agent.
Last update: December 2025 | © GeneXus. All rights reserved. GeneXus Powered by Globant