wow-agent-day03 OpenAI implements an intelligent grading agent

Reference: DataWhale wow agent day03
Use the environment configuration and large model settings from day02.

Define Function#

Extract JSON Part from Large Model Output#

import re
def extract_json_content(text):
  text = text.replace()

Function Explanation:
A function to extract and clean JSON content from text.

def extract_json_content(text):
    text = text.replace("\n","")
    pattern = r"```json(.*?)```"
    matches = re.findall(pattern, text, re.DOTALL)
    if matches:
        return matches[0].strip()
    return text

Parameters:
    text (str): The input text string, usually containing a JSON code block
    
Returns:
    str: The extracted and cleaned JSON string
    
Function Description:
1. text.replace("\n","") - Removes all newline characters from the text.
2. pattern = r"```json(.*?)```" - Defines a regular expression pattern to match content between ```json and ```.
3. re.findall() - Uses the regular expression to find all matches, re.DOTALL allows . to match newline characters.
4. matches[0].strip() - Gets the first match and removes leading and trailing whitespace.
5. If there are no matches, return the original text.

Parse JSON String into Python Object#

Some modifications were made to the function.

class JsonOutputParser:
    def parse(self, result):
        # First, try to parse directly
        try:
            return json.loads(result)
        except json.JSONDecodeError:
            pass
            
        # Try to extract JSON content
        cleaned_result = extract_json_content(result)
        try:
            return json.loads(cleaned_result)
        except json.JSONDecodeError:
            pass
            
        # Try to fix common JSON errors
        try:
            # Handle single quotes
            fixed_result = cleaned_result.replace("'", '"')
            # Handle trailing commas
            fixed_result = re.sub(r',\s*}', '}', fixed_result)
            # Handle unclosed quotes
            fixed_result = re.sub(r'([^"])"([^"])', r'\1"\2', fixed_result)
            return json.loads(fixed_result)
        except json.JSONDecodeError as e:
            raise ValueError(f"Unable to parse JSON output. Original output: {result}\nError message: {str(e)}")

Function Explanation:

    Parameters:
        result (str): Text containing JSON generated by LLM
    Returns:
        dict: Parsed JSON object
        
    Optimization Description:
    1. Added multiple JSON extraction methods to improve robustness.
    2. Added JSON repair mechanism to handle common errors.
    3. Added retry mechanism to improve success rate.

Define GradingOpenAI#

class GradingOpenAI:
    def __init__(self):
        self.model = "glm-4-flash"
        self.output_parser = JsonOutputParser()
        self.template = """You are an expert in grading the Chinese patent agent examination,
skilled at generating scores and comments in Chinese based on the given questions and answers,
and outputting in a specific format.
Your task is to generate scores and comments in Chinese based on the answers provided by the candidates for the questions I input, and return them in JSON format.
The grading criteria should be somewhat lenient; as long as the candidate conveys the basic meaning, they should receive points.
If the answer has numerical annotations, it means that if the candidate answers this knowledge point, they will receive a certain number of points for this question.
The generated comments in Chinese need to be correctly parsed by the json.loads() function.
The entire generated comment in Chinese should be wrapped in English double quotes, and within the wrapped string, please use Chinese double quotes.
The comments in Chinese should not contain newline characters, escape characters, etc.

The output format is JSON:
{{
  "llmgetscore": 0,
  "llmcomments": "Chinese comments"
}}

Compare the student's answer with the correct answer,
and provide a score out of 10 and comments in Chinese. 
Question: {ques_title} 
Answer: {answer} 
Student's reply: {reply}"""

    def create_prompt(self, ques_title, answer, reply):
        return self.template.format(
            ques_title=ques_title,
            answer=answer,
            reply=reply
        )

    def grade_answer(self, ques_title, answer, reply):
        success = False
        while not success:
            # This is a necessary expedient
            # The above JSON parsing function is not performing well, so generate several times until parsing succeeds.
            # First parse the content generated by the large model; if parsing fails, let the large model generate it again.
            try:
                response = client.chat.completions.create(
                    model=self.model,
                    messages=[
                        {"role": "system", "content": "You are a professional examination grading expert."},
                        {"role": "user", "content": self.create_prompt(ques_title, answer, reply)}
                    ],
                    temperature=0.7
                )

                result = self.output_parser.parse(response.choices[0].message.content)
                success = True
            except Exception as e:
                print(f"Error occurred: {e}")
                continue

        return result['llmgetscore'], result['llmcomments']

    def run(self, input_data):
        output = []
        for item in input_data:
            score, comment = self.grade_answer(
                item['ques_title'], 
                item['answer'], 
                item['reply']
            )
            item['llmgetscore'] = score
            item['llmcomments'] = comment
            output.append(item)
        return output
grading_openai = GradingOpenAI()

Demonstration#

input

# Example input data
input_data = [
 {'ques_title': 'Please explain the meaning of common technical features, distinguishing technical features, additional technical features, and necessary technical features.',
  'answer': 'Common technical features: technical features shared with the closest prior art (2.5 points); Distinguishing technical features: technical features that distinguish from the closest prior art (2.5 points); Additional technical features: technical features that further limit the cited technical features, additional technical features (2.5 points); Necessary technical features: technical features that are essential to solve the technical problem (2.5 points).',
  'fullscore': 10,
  'reply': 'Common technical features: technical features that are the same as the compared technical solution\nDistinguishing technical features: technical features that are different from the compared technical solution\nAdditional technical features: technical features that further limit the cited technical features\nNecessary technical features: technical features that are essential to solve the technical problem'},
 {'ques_title': 'Please explain the preamble, feature part, citation part, and limitation part.',
  'answer': 'Preamble: In independent claims, the subject + technical features shared with the closest prior art, before the features are characterized (2.5 points); Feature part: In independent claims, technical features that distinguish from the closest prior art, after the features are characterized (2.5 points); Citation part: The claim numbers and subjects cited from the claims (2.5 points); Limitation part: Additional technical features from the claims (2.5 points).',
  'fullscore': 10,
  'reply': 'Preamble: technical features that are the same as the prior art in independent claims\nFeature part: technical features that distinguish from the prior art in independent claims\nCitation part: parts that cite other claims from dependent claims\nLimitation part: technical features that further limit the cited claims'}]

Run the agent

graded_data = grading_openai.run(input_data)
print(graded_data)

Result
[{'ques_title': 'Please explain the meaning of common technical features, distinguishing technical features, additional technical features, and necessary technical features.', 'answer': 'Common technical features: technical features shared with the closest prior art (2.5 points); Distinguishing technical features: technical features that distinguish from the closest prior art (2.5 points); Additional technical features: technical features that further limit the cited technical features, additional technical features (2.5 points); Necessary technical features: technical features that are essential to solve the technical problem (2.5 points).', 'fullscore': 10, 'reply': 'Common technical features: technical features that are the same as the compared technical solution\nDistinguishing technical features: technical features that are different from the compared technical solution\nAdditional technical features: technical features that further limit the cited technical features\nNecessary technical features: technical features that are essential to solve the technical problem', 'llmgetscore': 10, 'llmcomments': 'The candidate's explanation of common technical features, distinguishing technical features, additional technical features, and necessary technical features is basically correct, accurately expressing the meanings of these concepts, thus receiving full marks.'}, {'ques_title': 'Please explain the preamble, feature part, citation part, and limitation part.', 'answer': 'Preamble: In independent claims, the subject + technical features shared with the closest prior art, before the features are characterized (2.5 points); Feature part: In independent claims, technical features that distinguish from the closest prior art, after the features are characterized (2.5 points); Citation part: The claim numbers and subjects cited from the claims (2.5 points); Limitation part: Additional technical features from the claims (2.5 points).', 'fullscore': 10, 'reply': 'Preamble: technical features that are the same as the prior art in independent claims\nFeature part: technical features that distinguish from the prior art in independent claims\nCitation part: parts that cite other claims from dependent claims\nLimitation part: technical features that further limit the cited claims', 'llmgetscore': 8, 'llmcomments': 'The student’s answer is basically correct, the explanations of the preamble and feature part are consistent with the standard answer, and the understanding of the citation part and limitation part is also correct, but the specific order and position of the technical features in the standard answer were not fully expressed, thus receiving 8 points.'}]