â Back to CELLO Main
đ Detailed Evaluation Reports
Complete transparency â C source, Rust output, compilation errors, and scoring details
âšī¸ About These Reports: Each report includes the original C code, generated Rust code from both Claude and Gemini, full compilation output (including errors), and detailed scoring breakdowns across all 6 evaluation dimensions.
Latest Evaluations (2026-02-16) â With Real Compilation
string_utils
Latest
Both Compiled
String utility functions: duplicate, trim, lowercase, concat, char count
Claude: 76/100 | Gemini: 81/100
View Report â
buffer
Latest
Both Compiled
Dynamic buffer with growing byte array management
Claude: 77/100 | Gemini: 66/100
View Report â
hashmap
Latest
Both Failed
Hash map with separate chaining â borrow checker challenge
Claude: 53/100 (E0506) | Gemini: 52/100 (E0277, E0506)
View Report â
Previous Evaluations (2026-02-15) â Syntax Check Only
These evaluations used basic syntax checking (rustc not available). Scores differ from real compilation results.
đ Report Structure
Each markdown report contains:
- Original C source code with line numbers
- Complete Rust code generated by each model
- Full rustc compilation output (success or errors)
- Detailed scoring breakdown:
- Compilation (25%) â Does it compile?
- Safety (20%) â Memory safety improvements
- Quality (20%) â Idiomatic Rust patterns
- Correctness (15%) â Functional equivalence
- Maintainability (10%) â Code organization
- Performance (10%) â Efficiency patterns
- Side-by-side model comparison table
- Full evaluation methodology