First_agent_template

Sleeping

dball commited on Feb 19

Commit

3ec3001

verified ·

1 Parent(s): 9caf3d5

Deduplicate extracted content parts

Deduplicate extracted content parts because every part showed up twice.

Add `extract_website_content_parts` to the `tools` list.

Files changed (1) hide show

app.py CHANGED Viewed

@@ -34,13 +34,13 @@ def extract_website_content_parts(url: str, extraction_pattern: str) -> List[str
         url: The URL of the website from which content parts should be extracted
         extraction_pattern: The regular expression string of the content parts to extract from the website
     Returns:
-        List[str]: The content parts matching extraction_pattern of the website `url`
     """
     try:
         response = requests.get(url)
         response.raise_for_status()
         matches: List[str] = re.findall(extraction_pattern, response.text)
-        return matches
     except requests.RequestException as e:
         return [f"Error fetching website content: {str(e)}"]
@@ -92,7 +92,7 @@ with open("prompts.yaml", 'r') as stream:
 agent = CodeAgent(
     model=model,
-    tools=[final_answer, search_tool, get_website_content, get_papers_url_for_date, get_current_time_in_timezone],
     max_steps=30,
     verbosity_level=1,
     grammar=None,

         url: The URL of the website from which content parts should be extracted
         extraction_pattern: The regular expression string of the content parts to extract from the website
     Returns:
+        List[str]: The deduplicated content parts matching extraction_pattern of the website `url`
     """
     try:
         response = requests.get(url)
         response.raise_for_status()
         matches: List[str] = re.findall(extraction_pattern, response.text)
+        return list(set(matches))
     except requests.RequestException as e:
         return [f"Error fetching website content: {str(e)}"]
 agent = CodeAgent(
     model=model,
+    tools=[final_answer, search_tool, extract_website_content_parts, get_website_content, get_papers_url_for_date, get_current_time_in_timezone],
     max_steps=30,
     verbosity_level=1,
     grammar=None,