📊 Detailed Evaluation Reports

Complete transparency — C source, Rust output, compilation errors, and scoring details

ℹ️ About These Reports: Each report includes the original C code, generated Rust code from both Claude and Gemini, full compilation output (including errors), and detailed scoring breakdowns across all 6 evaluation dimensions.

Latest Evaluations (2026-02-16) — With Real Compilation

string_utils

Latest Both Compiled

String utility functions: duplicate, trim, lowercase, concat, char count

Claude: 76/100 | Gemini: 81/100

View Report →

buffer

Latest Both Compiled

Dynamic buffer with growing byte array management

Claude: 77/100 | Gemini: 66/100

View Report →

hashmap

Latest Both Failed

Hash map with separate chaining — borrow checker challenge

Claude: 53/100 (E0506) | Gemini: 52/100 (E0277, E0506)

View Report →

Previous Evaluations (2026-02-15) — Syntax Check Only

These evaluations used basic syntax checking (rustc not available). Scores differ from real compilation results.

string_utils (Feb 15)

Claude: 76/100 | Gemini: 81/100

View Report →

buffer (Feb 15)

Claude: 77/100 | Gemini: 63/100

View Report →

hashmap (Feb 15)

Claude: 72/100 | Gemini: 74/100

View Report →

📖 Report Structure

Each markdown report contains:

Original C source code with line numbers
Complete Rust code generated by each model
Full rustc compilation output (success or errors)
Detailed scoring breakdown:
- Compilation (25%) — Does it compile?
- Safety (20%) — Memory safety improvements
- Quality (20%) — Idiomatic Rust patterns
- Correctness (15%) — Functional equivalence
- Maintainability (10%) — Code organization
- Performance (10%) — Efficiency patterns
Side-by-side model comparison table
Full evaluation methodology