Newsletters and Articles

Evaluating Large Language Models – Evaluation Metrics

https://www.enkefalos.com/newsletters-and-articles/evaluating-large-language-models-evaluation-metrics/ Evaluating Large Language Models – Evaluation Metrics
Current major applications of LLMs
Current major applications of LLMs – https://arxiv.org/pdf/2308.05374.pdf

Metrics for Evaluating Large Language Models

For example, the prompt

Evaluate Coherence in the Summarization Task 
You will be given one summary written for a news article.
Your task is to rate the summary on one metric. Please make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.


Evaluation Criteria:
Coherence (1-5) - the collective quality of all sentences. We align this dimension with the DUC quality question of structure and coherence whereby "the summary should be well-structured and well-organized. The summary should not just be a heap of related information, but should build from sentence to sentence to a coherent body of information about a topic."


Evaluation Steps:
Read the news article carefully and identify the main topic and key points.
Read the summary and compare it to the news article. Check if the summary covers the main topic and key points of the news article, and if it presents them in a clear and logical order.
Assign a score for coherence on a scale of 1 to 5, where 1 is the lowest and 5 is the highest based on the Evaluation Criteria.


Example:
Source Text: {{Document}}
Summary: {{Summary}}
Evaluation Form (scores ONLY):
Coherence:


author-avatar

About Preeth P

Machine Learning Engineer