- Go to https://huggingface.co/papers and click through each of the top 3 upvoted papers.
- For each paper:
- Record the title, URL and upvotes
- Summarise the abstract section
- Finally, compile together a summary of all 3 papers, ranked by upvotes
- Go to https://huggingface.co/papers and click through each of the top 3 upvoted papers.
- For each paper:
- Record the title, URL and upvotes
- Summarise the abstract section
- After that go back one page
- Finally, compile together a summary of all 3 papers, ranked by upvotes
Go to TechCrunch and extract top 10 headlines from the last 24 hours
model |
nb ver. |
result |
remark |
Ophiuchi-Qwen3-14B |
0.1.5 |
100% |
|
Devstral-small |
0.1.6 |
90% |
but lists 11 items instead of 10 |
jan-nano |
0.1.7 |
50% |
got 7 articles, but not the most recent ones, and hallucinated 3 |
Falcon-H1 |
0.1.8 |
90% |
only got 5 headlines, not 5 extra from the latest news |
Qwen3-30b-a3b |
0.1.8 |
75% |
only got 8, and loops beccause it doesn't pass the validation |
Go to https://techcrunch.com/latest/ and extract top 10 latest news from the last 24 hours on that page, scroll down when needed
model |
nb ver. |
result |
remark |
jan-nano |
0.1.7 |
80% |
got 8 articles and hallucinated 2, probably because it needs to scroll down. |
Qwen3-30b-a3b |
0.1.8 |
0% |
tries to get headlines from page 2, validator catches that but planner can't. |
Look for the trending Python repositories on GitHub with most stars
go to https://github.com/trending and look for the trending Python repositories with most stars, scroll down if needed.
go to ebay and find the cheapest epyc milan 64-core cpu
model |
nb ver. |
result |
remark |
Ophiuchi-Qwen3-14B |
0.1.5 |
0% |
msg from planner: Planning failed: Failed to invoke with structured output: Error: Invalid boolean string |
Devstral-small |
0.1.8 |
80% |
search, sort but returned 3rd cheapest with wrong seller info |
jan-nano |
0.1.7 |
0% |
doesn't submit search |
Falcon-H1 |
0.1.8 |
0% |
does not click on submit |
Qwen3-30b-a3b |
0.1.8 |
80% |
not the cheapest |
go to https://ebay.com fill in the search term "epyc milan 64-core cpu" then submit search. Find the cheapest one by sorting function of the site.
get the 3 most recent papers about cancer from Pubmed
get the 3 most recent papers about cancer from Pubmed use sort function of the site if needed
go to https://pubmed.ncbi.nlm.nih.gov fill cancer in the search field then click submit search. Record the titles of 3 most recent papers on that page.
model |
nb ver. |
result |
remark |
jan-nano |
0.1.7 |
100% |
|
How's the weather in NYC?