Smolagents agent

An AI news aggregator that fetches and summarizes the latest news based on user-defined interests using DuckDuckGo search and OpenAI models written in Python Smolagents.

src/main.py

src/__main__.py

1"""Module defines the main entry point for the Apify Actor.
2
3Feel free to modify this file to suit your specific needs.
4
5To build Apify Actors, utilize the Apify SDK toolkit, read more at the official documentation:
6https://docs.apify.com/sdk/python
7"""
8
9import os
10import sys
11from io import TextIOWrapper
12
13from apify import Actor
14from smolagents import CodeAgent, DuckDuckGoSearchTool, OpenAIServerModel
15
16# Configure stdout to use UTF-8 encoding for proper unicode support
17if hasattr(sys.stdout, 'reconfigure'):
18    sys.stdout.reconfigure(encoding='utf-8')
19else:
20    # Fall back to TextIOWrapper for environments where reconfigure is unavailable
21    sys.stdout = TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
22
23OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY')
24OPENAI_API_BASE = 'https://api.openai.com/v1'
25
26
27async def main() -> None:
28    """Define a main entry point for the Apify Actor.
29
30    This coroutine is executed using `asyncio.run()`, so it must remain an asynchronous function for proper execution.
31    Asynchronous execution is required for communication with Apify platform, and it also enhances performance in
32    the field of web scraping significantly.
33    """
34    async with Actor:
35        # Retrieve input parameters from the Apify Actor configuration
36        actor_input = await Actor.get_input() or {}
37
38        model = actor_input.get('model')
39        if not model:
40            raise ValueError('Missing "model" attribute in Actor input!')
41
42        user_interests = actor_input.get('interests')
43        if not user_interests:
44            raise ValueError('Missing "interests" attribute in Actor input!')
45
46        # Initialize the OpenAI model for text processing
47        model = OpenAIServerModel(
48            model_id=model,
49            api_base=OPENAI_API_BASE,
50            api_key=OPENAI_API_KEY,
51        )
52
53        # Create the search tool and AI agent
54        search_tool = DuckDuckGoSearchTool()
55        agent = CodeAgent(tools=[search_tool], model=model)
56
57        # Construct a query using user-defined interests
58        query = f'Give me latest news on {", ".join(user_interests)}'
59
60        # Use the agent to fetch search results
61        search_results = agent.run(query)
62        Actor.log.info('News search operation completed successfully.')
63
64        # Generate a summary of the retrieved news articles
65        summary_prompt = f'Summarize the following news articles: {search_results}'
66        summary = agent.run(summary_prompt)
67        Actor.log.info('News summarization operation completed successfully.')
68
69        # Push the results to the dataset by wrapping it in an object.
70        Actor.log.info('The results will be stored in the dataset.')
71        await Actor.push_data({'summary': summary})

Python Smolagents template

An AI news aggregator that fetches and summarizes the latest news based on user-defined interests using DuckDuckGo search and OpenAI models, built with Python Smolagents.

How it works

This Actor works as an AI-powered news aggregator:

The user provides a list of topics they are interested in.
The Actor searches for relevant news articles using DuckDuckGo.
The retrieved articles are processed and summarized using an OpenAI model.
The final summarized news output is stored in a dataset.

How to use

Provide input: Define your topics of interest by setting the interests field in the Actor input.
Choose an OpenAI model: Specify the OpenAI model to use in the model field.
Run the Actor: Execute the Actor on the Apify platform or locally.
Retrieve results: The summarized news articles will be available in the default dataset.

Modifying the Agent

You can modify the src/main.py file to adjust the query structure or change how the results are summarized.
If needed, you can replace the DuckDuckGo search tool with another search API.
Update the prompt used for summarization to fine-tune the output.

Included features

Apify SDK for Python - a toolkit for building Apify Actors and scrapers in Python
Input schema - define and easily validate a schema for your Actor's input
Dataset - store structured data where each object stored has the same attributes
Smolagents - lightweight AI agent framework

Resources

Start with Python

Scrape single page with provided URL with HTTPX and extract data from page's HTML with Beautiful Soup.

Starter

BeautifulSoup

Example of a web scraper that uses Python HTTPX to scrape HTML from URLs provided on input, parses it using BeautifulSoup and saves results to storage.

Playwright + Chrome

Crawler example that uses headless Chrome driven by Playwright to scrape a website. Headless browsers render JavaScript and can help when getting blocked.

Selenium + Chrome

Scraper example built with Selenium and headless Chrome browser to scrape a website and save the results to storage. A popular alternative to Playwright.

Empty Python project

Empty template with basic structure for the Actor with Apify SDK that allows you to easily add your own functionality.

Standby Python project

Template with basic structure for an Actor using Standby mode that allows you to easily add your own functionality.

Starter

Already have a solution in mind?

Sign up for a free Apify account and deploy your code to the platform in just a few minutes! If you want a head start without coding it yourself, browse our Store of existing solutions.

Import your code Go to store