Last active
June 14, 2022 12:36
-
-
Save mikeal/e87fae29728fea1761b7 to your computer and use it in GitHub Desktop.
The easiest way to get comments out of any code file... seriously?!?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
var highlight = require('highlight.js') | |
var cheerio = require('cheerio') | |
var strip = ['/', '#', ' ', '*', "<", ">", '-', '\\'] | |
function getComments (str) { | |
var html = highlight.highlightAuto(str).value | |
var $ = cheerio.load(html) | |
var lines = $('span.hljs-comment').map(function(i, el) {return $(this).text();}).get() | |
return lines.map(function (l) { | |
while (l.length && strip.indexOf(l[0]) !== -1) { | |
l = l.slice(1) | |
} | |
return l | |
}) | |
} | |
module.exports = getComments |
curious, did you investigate doing something on the engine level? surely that's more efficient and guaranteed to capture all sorts of comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
After lots of investigation I figured out that this is actually the easiest way in Node.js to get comments out of code files written in any language.
It's pretty ridiculous. It literally involves running the code through a library that spits out HTML with highlight classes, then parsing that html in cheerio and getting the text in the right spans.
There's a few dozen web based code editors and Atom and VS Code are both in JavaScript, but their parsing code is so deeply embedded in each product that it's impossible to rip out. There's also like 3 highlighting libraries in JS, including this one, but the parse trees they create internally are more difficult to use than parsing the HTML output and there's no way to just get the parse tree w/o the HTML so it's not like you save on performance much by not just using the HTML output.