Of course. Here is a complete, self-contained JavaScript script that automates the entire process of scrolling, scraping, and deduplicating the tweet text.
You can copy and paste this entire block of code directly into your browser's developer console and press Enter to run it.
/**
* This script automates the process of scrolling down a page,
* scraping tweet text, and logging the unique results.
* It's designed to be pasted directly into the browser console.
*/
(async () => {
// --- 1. SETUP ---
// A Set is used to automatically store only unique tweet texts.
const scrapedText = new Set();
const maxScrolls = 100; // Safety limit to prevent an infinite loop.
const scrollDelay = 1000; // 1-second delay to allow new content to load.
console.log("π Starting automated scroll and scrape process...");
console.log(`- Will scroll a maximum of ${maxScrolls} times.`);
console.log(`- Pausing for ${scrollDelay / 1000} second(s) after each scroll.`);
// --- 2. THE LOOP ---
for (let i = 0; i < maxScrolls; i++) {
// Store the scroll position *before* we scroll.
const lastScrollY = window.scrollY;
// Scrape all visible tweet elements.
const elements = document.querySelectorAll('[data-testid="cellInnerDiv"] [data-testid="tweetText"] span');
// Add the text content of each element to our Set.
elements.forEach(span => {
if (span.textContent) {
scrapedText.add(span.textContent.trim());
}
});
console.log(`Scrape #${i + 1}: Found ${elements.length} elements on screen. Total unique tweets so far: ${scrapedText.size}`);
// Scroll down by one viewport height.
window.scrollBy(0, window.innerHeight);
// Wait for a moment to let the page load new content.
await new Promise(resolve => setTimeout(resolve, scrollDelay));
// Check if the scroll position has changed. If not, we've hit the bottom.
if (window.scrollY === lastScrollY) {
console.log("β
Reached the bottom of the page. Stopping scroll.");
break; // Exit the loop.
}
}
// --- 3. FINAL OUTPUT ---
// Convert the Set to an Array to allow for sorting.
// The Set has already handled all deduplication efficiently.
const finalTweets = Array.from(scrapedText).sort();
console.log("\n\n--- β¨ SCRAPE COMPLETE β¨ ---");
console.log(`Found a total of ${finalTweets.length} unique tweets.`);
console.log("---------------------------------\n");
// Print each unique tweet to the console, separated by a line.
finalTweets.forEach((tweet, index) => {
console.log(`[${index + 1}] ${tweet}`);
console.log("--------------------------------------------------");
});
})(); // This is an Immediately Invoked Function Expression (IIFE) that runs the async function right away.
- Main
async
Function: The entire script is wrapped in an(async () => { ... })();
. This is an Immediately Invoked Function Expression (IIFE) that allows us to useawait
for handling delays. It runs automatically as soon as you paste it in the console. - Setup (
Set
): We initialize aSet
calledscrapedText
. This data structure is perfect for this task because it only allows unique values, automatically handling the deduplication you requested in the most efficient way. - The Loop:
- A
for
loop is set to run up to100
times as a safety measure. - Scrape & Store: In each iteration, it uses
document.querySelectorAll
to find all visible tweets and adds their text content to theSet
. - Scroll:
window.scrollBy(0, window.innerHeight)
scrolls the page down by exactly the height of the visible window. - Wait:
await new Promise(resolve => setTimeout(resolve, 1000))
is a modern way to pause execution for 1 second, giving the website time to load new tweets. - End Detection: The script cleverly checks if it has reached the bottom by comparing the scroll position (
window.scrollY
) before and after the scroll attempt. If the position doesn't change, it means we can't scroll any further, and the loop is terminated withbreak
.
- A
- Final Output:
- After the loop finishes,
Array.from(scrapedText)
converts theSet
of unique tweets into an array. .sort()
is then called on the array to order the tweets alphabetically.- Finally, the script prints a summary and then logs each unique, sorted tweet to the console, separated by a distinct line for easy reading, as you requested.
- After the loop finishes,