Skip to content

Instantly share code, notes, and snippets.

@eevmanu
Created June 22, 2025 20:42
Show Gist options
  • Save eevmanu/8b12e6d0871ef79bdb1eb26074ee2909 to your computer and use it in GitHub Desktop.
Save eevmanu/8b12e6d0871ef79bdb1eb26074ee2909 to your computer and use it in GitHub Desktop.
simple javascript code to retrieve whole tweet replies from a tweet (working on 20250622)
/**
* This script automates the process of scrolling down a page,
* scraping tweet text, and logging the unique results.
* It's designed to be pasted directly into the browser console.
*/
(async () => {
// --- 1. SETUP ---
// A Set is used to automatically store only unique tweet texts.
const scrapedText = new Set();
const maxScrolls = 100; // Safety limit to prevent an infinite loop.
const scrollDelay = 1000; // 1-second delay to allow new content to load.
console.log("πŸš€ Starting automated scroll and scrape process...");
console.log(`- Will scroll a maximum of ${maxScrolls} times.`);
console.log(`- Pausing for ${scrollDelay / 1000} second(s) after each scroll.`);
// --- 2. THE LOOP ---
for (let i = 0; i < maxScrolls; i++) {
// Store the scroll position *before* we scroll.
const lastScrollY = window.scrollY;
// Scrape all visible tweet elements.
const elements = document.querySelectorAll('[data-testid="cellInnerDiv"] [data-testid="tweetText"] span');
// Add the text content of each element to our Set.
elements.forEach(span => {
if (span.textContent) {
scrapedText.add(span.textContent.trim());
}
});
console.log(`Scrape #${i + 1}: Found ${elements.length} elements on screen. Total unique tweets so far: ${scrapedText.size}`);
// Scroll down by one viewport height.
window.scrollBy(0, window.innerHeight);
// Wait for a moment to let the page load new content.
await new Promise(resolve => setTimeout(resolve, scrollDelay));
// Check if the scroll position has changed. If not, we've hit the bottom.
if (window.scrollY === lastScrollY) {
console.log("βœ… Reached the bottom of the page. Stopping scroll.");
break; // Exit the loop.
}
}
// --- 3. FINAL OUTPUT ---
// Convert the Set to an Array to allow for sorting.
// The Set has already handled all deduplication efficiently.
const finalTweets = Array.from(scrapedText).sort();
console.log("\n\n--- ✨ SCRAPE COMPLETE ✨ ---");
console.log(`Found a total of ${finalTweets.length} unique tweets.`);
console.log("---------------------------------\n");
// Print each unique tweet to the console, separated by a line.
finalTweets.forEach((tweet, index) => {
console.log(`[${index + 1}] ${tweet}`);
console.log("--------------------------------------------------");
});
})(); // This is an Immediately Invoked Function Expression (IIFE) that runs the async function right away.

Of course. Here is a complete, self-contained JavaScript script that automates the entire process of scrolling, scraping, and deduplicating the tweet text.

You can copy and paste this entire block of code directly into your browser's developer console and press Enter to run it.

The Automated Scraper Script

/**
 * This script automates the process of scrolling down a page,
 * scraping tweet text, and logging the unique results.
 * It's designed to be pasted directly into the browser console.
 */
(async () => {
  // --- 1. SETUP ---
  // A Set is used to automatically store only unique tweet texts.
  const scrapedText = new Set();
  const maxScrolls = 100; // Safety limit to prevent an infinite loop.
  const scrollDelay = 1000; // 1-second delay to allow new content to load.

  console.log("πŸš€ Starting automated scroll and scrape process...");
  console.log(`- Will scroll a maximum of ${maxScrolls} times.`);
  console.log(`- Pausing for ${scrollDelay / 1000} second(s) after each scroll.`);

  // --- 2. THE LOOP ---
  for (let i = 0; i < maxScrolls; i++) {
    // Store the scroll position *before* we scroll.
    const lastScrollY = window.scrollY;

    // Scrape all visible tweet elements.
    const elements = document.querySelectorAll('[data-testid="cellInnerDiv"] [data-testid="tweetText"] span');

    // Add the text content of each element to our Set.
    elements.forEach(span => {
      if (span.textContent) {
        scrapedText.add(span.textContent.trim());
      }
    });

    console.log(`Scrape #${i + 1}: Found ${elements.length} elements on screen. Total unique tweets so far: ${scrapedText.size}`);

    // Scroll down by one viewport height.
    window.scrollBy(0, window.innerHeight);

    // Wait for a moment to let the page load new content.
    await new Promise(resolve => setTimeout(resolve, scrollDelay));

    // Check if the scroll position has changed. If not, we've hit the bottom.
    if (window.scrollY === lastScrollY) {
      console.log("βœ… Reached the bottom of the page. Stopping scroll.");
      break; // Exit the loop.
    }
  }

  // --- 3. FINAL OUTPUT ---
  // Convert the Set to an Array to allow for sorting.
  // The Set has already handled all deduplication efficiently.
  const finalTweets = Array.from(scrapedText).sort();

  console.log("\n\n--- ✨ SCRAPE COMPLETE ✨ ---");
  console.log(`Found a total of ${finalTweets.length} unique tweets.`);
  console.log("---------------------------------\n");

  // Print each unique tweet to the console, separated by a line.
  finalTweets.forEach((tweet, index) => {
    console.log(`[${index + 1}] ${tweet}`);
    console.log("--------------------------------------------------");
  });

})(); // This is an Immediately Invoked Function Expression (IIFE) that runs the async function right away.

How It Works

  1. Main async Function: The entire script is wrapped in an (async () => { ... })();. This is an Immediately Invoked Function Expression (IIFE) that allows us to use await for handling delays. It runs automatically as soon as you paste it in the console.
  2. Setup (Set): We initialize a Set called scrapedText. This data structure is perfect for this task because it only allows unique values, automatically handling the deduplication you requested in the most efficient way.
  3. The Loop:
    • A for loop is set to run up to 100 times as a safety measure.
    • Scrape & Store: In each iteration, it uses document.querySelectorAll to find all visible tweets and adds their text content to the Set.
    • Scroll: window.scrollBy(0, window.innerHeight) scrolls the page down by exactly the height of the visible window.
    • Wait: await new Promise(resolve => setTimeout(resolve, 1000)) is a modern way to pause execution for 1 second, giving the website time to load new tweets.
    • End Detection: The script cleverly checks if it has reached the bottom by comparing the scroll position (window.scrollY) before and after the scroll attempt. If the position doesn't change, it means we can't scroll any further, and the loop is terminated with break.
  4. Final Output:
    • After the loop finishes, Array.from(scrapedText) converts the Set of unique tweets into an array.
    • .sort() is then called on the array to order the tweets alphabetically.
    • Finally, the script prints a summary and then logs each unique, sorted tweet to the console, separated by a distinct line for easy reading, as you requested.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment