search
Get Started
search

Best LLM Testing

Updated Daily
Filter by Tags

Rankings use category fit, feature coverage, pricing signals, public reception, and recency. Affiliate relationships do not affect scores.

0.0 - 10.0
Best 1 Claude Fable 5

Claude Fable 5 is Anthropic's 2026 flagship model, succeeding the Opus line with stronger long-horizon reasoning, agentic tool use, and code generation. It anchors Claude Code and the Claude API tier for the most demanding tasks, and is widely regarded as the strongest generally available model of i...

2 Burp Suite Professional

Burp Suite Professional is the industry-leading toolkit for web application security testing, used by security professionals and penetration testers worldwide. It provides comprehensive crawling, scanning, and manual testing capabilities with minimal false positives. The scanner detects over 300 vul...

3 OpenAI API
OpenAI API

The OpenAI API remains the industry benchmark for immediate access to cutting-edge, general-purpose LLM capabilities. Its unparalleled ease of use, combined with consistently high performance across reasoning, coding, and creative tasks, makes it the default starting point for most new AI applicatio...

4 Claude Sonnet 4.6

Claude Sonnet 4.6 is an advanced AI chatbot developed by Anthropic. It’s notable for its robust performance across diverse tasks including coding, long-form writing, and tool utilization. Designed for enterprise use, it represents a significant step in accessible artificial intelligence capabilities...

5 Azure OpenAI Service

Azure OpenAI Service provides businesses with secure access to OpenAI’s large language models like GPT-4 through Microsoft Azure. It offers enterprise-level features including robust security, compliance certifications, and seamless integration within existing Microsoft environments. This service is...

6 Qwen2.5-Coder

Qwen2.5-Coder is a powerful open-source large language model specifically optimized for code generation and understanding, with a strong emphasis on multilingual capabilities. Its training data includes vast amounts of code in multiple languages, including Chinese, making it particularly well-suited...

7 Claude 3 Opus

Claude 3 Opus is Anthropic's flagship model, designed for exceptional intelligence and nuanced understanding. It excels in creative writing, complex reasoning, and generating human-like responses. Its 200,000 token context window allows for processing extensive documents and maintaining context in...

8 Fedora
Fedora

Fedora is a prominent open-source Linux distribution developed primarily by Red Hat. It’s notable for its focus on incorporating the latest software innovations and technologies before they become mainstream. Fedora serves as an important platform for developers, testers, and those interested in exp...

9 Web Developer
Free Plan Available

The Web Developer extension provides a suite of tools for web developers to inspect, debug, and manipulate web pages. It includes features like CSS editor, JavaScript console, and element selector. While primarily designed for developers, it can also be useful for advanced users who need detailed co...

10 Mail-Tester

Mail-Tester is a remarkably simple yet powerful free tool for quickly assessing email deliverability. It generates a temporary email address and provides a detailed report analyzing your email's spam score, authentication records (SPF, DKIM, DMARC), and inbox placement across various providers like...

11 kind
kind

Containerization is a virtualization method where applications and their dependencies are packaged into standardized units ("containers") for portability and isolation across different computing environments.

12 Postman
Postman
Free Plan Available From $49/mo

Postman has evolved from a simple Chrome extension into the ubiquitous platform for API development and testing, defining the category for many. It excels as an interactive environment for designing, debugging, and documenting APIs. Its core strength lies in the seamless workflow from manual explora...

13 Jest
Jest

Jest, developed by Facebook, is a comprehensive JavaScript testing framework known for its zero-configuration setup and powerful features like mocking, snapshot testing, and built-in assertion libraries. Its ease of use, especially with React, allows developers to write robust unit and integration t...

14 minikube
minikube

Minikube is a lightweight Kubernetes distribution that allows users to run a single-node cluster locally on their own machine for development and testing purposes.

15 Mistral 7B Instruct

Mistral 7B Instruct is a powerful open-source language model renowned for its impressive performance and efficiency. Trained on a massive dataset, it excels at following instructions and generating high-quality text across various tasks, including creative writing, code generation, and question answ...

16 WISC-V
WISC-V

The Wechsler Intelligence Scale for Children (WISC-V) is the primary tool for assessing cognitive ability in children aged 6 to 16. It provides a Full Scale IQ and evaluates five core indices: Verbal Comprehension, Visual Spatial, Fluid Reasoning, Working Memory, and Processing Speed. It is vital fo...

17 LangSmith
LangSmith

While not an agent builder itself, LangSmith is critical infrastructure for *building* and *improving* agents. It provides end-to-end observability, allowing developers to trace every step, input, and output of a complex agent run. This capability is invaluable for debugging hallucinations, optimizi...

18 CXL Institute - Conversion Optimization Mini-Master

The CXL Institute's Conversion Optimization Mini-Master is a focused program designed to teach the principles and practices of conversion rate optimization. It covers topics like A/B testing, user experience (UX), analytics, and persuasive design. While not a full digital marketing course, it's a va...

19 JMeter
JMeter

Apache JMeter is a long-standing, open-source load testing tool widely used for evaluating web application performance. While its user interface can feel dated, JMeters extensive plugin ecosystem and support for a wide range of protocols (HTTP, FTP, JDBC, LDAP, JMS, etc.) make it incredibly versatil...

20 TPGi ARC Toolkit

TPGi's ARC Toolkit is a web analysis platform providing detailed reports on website accessibility, compliance with WCAG guidelines, and adherence to various digital standards through automated testing.

21 UserTesting Human Insight Platform

UserTesting's platform enables businesses to gather qualitative feedback on websites and digital products via live video sessions and asynchronous user testing with diverse participants.

22 DeepSeek V4 Pro

DeepSeek V4 Pro is an advanced AI chatbot developed by DeepSeek. It’s notable for delivering strong reasoning and coding capabilities while significantly reducing computational costs compared to leading models. This makes it suitable for developers, researchers, and businesses requiring reliable AI...

23 Llama 3 8B (via Ollama)

Llama 3 8B represents a massive leap in general reasoning and instruction following for local models. While not exclusively a coding model, its superior coherence and ability to follow complex, multi-step instructions make it excellent for complex refactoring suggestions or generating detailed docum...

24 Statsig
Statsig

Statsig is a developer-centric experimentation platform that bridges the gap between product engineering and data science. It provides robust feature flagging, A/B testing, and real-time analytics. Unlike many marketing-focused tools, Statsig is built to be integrated directly into the application c...

25 Fluke 124 Digital Multimeter

The Fluke 124 is a highly regarded digital multimeter known for its accuracy and reliability. It features a large, easy-to-read display, auto-ranging capabilities, and a wide range of measurement functions including voltage, current, resistance, continuity, and diode testing. Its robust construction...

26 Docker Compose (v2)

While not an orchestrator for production clusters, Docker Compose remains the gold standard for defining and running multi-container applications locally. Its v2 integration with the Docker CLI makes defining complex local stacks (database, backend, frontend) incredibly straightforward. It is indisp...

27 Playwright
Playwright

Playwright is a powerful end-to-end testing framework for modern web applications. Developed by Microsoft, it allows developers to write scripts that automate browser interactions across Chromium, Firefox, and WebKit engines. It features high-speed execution, auto-waiting logic to reduce flakiness,...

28 bolt.diy
bolt.diy

Bolt.diy is a free, open-source extension for Continue AI that enables users to create and deploy custom chatbots with tailored knowledge bases sourced from various online documents and local files.

29 MicroK8s
MicroK8s

MicroK8s is a lightweight, single-package Kubernetes distribution designed for development and testing. Its incredibly easy to install and use, providing a simplified environment for experimenting with containerization concepts without the overhead of a full-blown Kubernetes cluster. Ideal for devel...

30 Continue (VS Code Extension)

Continue acts less as a direct completion tool and more as a universal, customizable interface for connecting to various local or remote LLMs (like Llama 3 or GPT-4). This flexibility is its greatest strength, allowing developers to test the best model for a specific task without switching IDEs or p...

Loading more...

Save to your list

Save your favorites and follow how their scores change over time.

Save favorites
Get updates
Compare scores

Already have an account? Sign in

Compare Items

See how they stack up against each other

Comparing
VS
Select 1 more item to compare