Smolagents agent
An AI news aggregator that fetches and summarizes the latest news based on user-defined interests using DuckDuckGo search and OpenAI models written in Python Smolagents.
src/main.py
src/__main__.py
1"""Module defines the main entry point for the Apify Actor.
2
3Feel free to modify this file to suit your specific needs.
4
5To build Apify Actors, utilize the Apify SDK toolkit, read more at the official documentation:
6https://docs.apify.com/sdk/python
7"""
8
9import os
10import sys
11from io import TextIOWrapper
12
13from apify import Actor
14from smolagents import CodeAgent, DuckDuckGoSearchTool, OpenAIServerModel
15
16# Configure stdout to use UTF-8 encoding for proper unicode support
17if hasattr(sys.stdout, 'reconfigure'):
18 sys.stdout.reconfigure(encoding='utf-8')
19else:
20 # Fall back to TextIOWrapper for environments where reconfigure is unavailable
21 sys.stdout = TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
22
23OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY')
24OPENAI_API_BASE = 'https://api.openai.com/v1'
25
26
27async def main() -> None:
28 """Define a main entry point for the Apify Actor.
29
30 This coroutine is executed using `asyncio.run()`, so it must remain an asynchronous function for proper execution.
31 Asynchronous execution is required for communication with Apify platform, and it also enhances performance in
32 the field of web scraping significantly.
33 """
34 async with Actor:
35 # Retrieve input parameters from the Apify Actor configuration
36 actor_input = await Actor.get_input() or {}
37
38 model = actor_input.get('model')
39 if not model:
40 raise ValueError('Missing "model" attribute in Actor input!')
41
42 user_interests = actor_input.get('interests')
43 if not user_interests:
44 raise ValueError('Missing "interests" attribute in Actor input!')
45
46 # Initialize the OpenAI model for text processing
47 model = OpenAIServerModel(
48 model_id=model,
49 api_base=OPENAI_API_BASE,
50 api_key=OPENAI_API_KEY,
51 )
52
53 # Create the search tool and AI agent
54 search_tool = DuckDuckGoSearchTool()
55 agent = CodeAgent(tools=[search_tool], model=model)
56
57 # Construct a query using user-defined interests
58 query = f'Give me latest news on {", ".join(user_interests)}'
59
60 # Use the agent to fetch search results
61 search_results = agent.run(query)
62 Actor.log.info('News search operation completed successfully.')
63
64 # Generate a summary of the retrieved news articles
65 summary_prompt = f'Summarize the following news articles: {search_results}'
66 summary = agent.run(summary_prompt)
67 Actor.log.info('News summarization operation completed successfully.')
68
69 # Push the results to the dataset by wrapping it in an object.
70 Actor.log.info('The results will be stored in the dataset.')
71 await Actor.push_data({'summary': summary})
Python Smolagents template
An AI news aggregator that fetches and summarizes the latest news based on user-defined interests using DuckDuckGo search and OpenAI models, built with Python Smolagents.
How it works
This Actor works as an AI-powered news aggregator:
- The user provides a list of topics they are interested in.
- The Actor searches for relevant news articles using DuckDuckGo.
- The retrieved articles are processed and summarized using an OpenAI model.
- The final summarized news output is stored in a dataset.
How to use
- Provide input: Define your topics of interest by setting the
interests
field in the Actor input. - Choose an OpenAI model: Specify the OpenAI model to use in the
model
field. - Run the Actor: Execute the Actor on the Apify platform or locally.
- Retrieve results: The summarized news articles will be available in the default dataset.
Modifying the Agent
- You can modify the
src/main.py
file to adjust the query structure or change how the results are summarized. - If needed, you can replace the
DuckDuckGo
search tool with another search API. - Update the prompt used for summarization to fine-tune the output.
Included features
- Apify SDK for Python - a toolkit for building Apify Actors and scrapers in Python
- Input schema - define and easily validate a schema for your Actor's input
- Dataset - store structured data where each object stored has the same attributes
- Smolagents - lightweight AI agent framework
Resources
Start with Python
Scrape single page with provided URL with HTTPX and extract data from page's HTML with Beautiful Soup.
Starter
BeautifulSoup
Example of a web scraper that uses Python HTTPX to scrape HTML from URLs provided on input, parses it using BeautifulSoup and saves results to storage.
Playwright + Chrome
Crawler example that uses headless Chrome driven by Playwright to scrape a website. Headless browsers render JavaScript and can help when getting blocked.
Selenium + Chrome
Scraper example built with Selenium and headless Chrome browser to scrape a website and save the results to storage. A popular alternative to Playwright.
Empty Python project
Empty template with basic structure for the Actor with Apify SDK that allows you to easily add your own functionality.
Standby Python project
Template with basic structure for an Actor using Standby mode that allows you to easily add your own functionality.
Starter