quarta-feira, maio 7, 2025
HomeArtificial IntelligenceMinimally-lossy text simplification with Gemini

Minimally-lossy text simplification with Gemini


Gemini-powered automatic evaluation and prompt refinement system

In order to achieve our goals, we developed an automated approach leveraging Gemini models for evaluation of simplification quality and self-refinement of prompts. However, crafting prompts for nuanced simplification, where readability must improve without sacrificing meaning or detail, is challenging. An automated system addresses this challenge by enabling the extensive trial-and-error needed to discover the most effective prompt.

Automated evaluation

Manual evaluation is impractical for rapid iteration. Our system employs two novel evaluation components:

  1. Readability assessment: Moving beyond simplistic metrics like Flesch-Kincaid, we used a Gemini prompt to score text readability on a 1-10 scale. This prompt was iteratively refined against human judgment, enabling a more nuanced assessment of comprehension ease. We observed in testing that this LLM-based readability assessment aligns better with human readability assessments than Flesch-Kincaid.
  2. Fidelity assessment: Ensuring meaning preservation is critical. Using Gemini 1.5 Pro, we implemented a process that maps claims from the original text to the simplified version. This method identifies specific error types like information loss, gain, or distortion, each weighted by severity, providing a granular measure of faithfulness to the original meaning (completeness and entailment).

Iterative prompt refinement: LLMs optimizing LLMs

The quality of the final simplification (generated by Gemini 1.5 Flash) heavily depends on the initial prompt. We automated the prompt optimization process itself via a prompt refinement loop: using the autoeval scores for readability and fidelity, another Gemini 1.5 Pro model analyzed the simplification prompt’s performance and proposed refined prompts for the next iteration.

This creates a powerful feedback loop where an LLM system iteratively improves its own instructions based on performance metrics, reducing reliance on manual prompt engineering and enabling the discovery of highly effective simplification strategies. For this work, the loop ran for 824 iterations until performance plateaued.

This automated process, where one LLM evaluates the output of another and refines its instructions (prompts) based on performance metrics (readability and fidelity) and granular errors, represents a key innovation. It moves beyond laborious manual prompt engineering, enabling the system to autonomously discover highly effective strategies for nuanced simplification over hundreds of iterations.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments