Created
February 3, 2013 15:59
-
-
Save madrobby/4702279 to your computer and use it in GitHub Desktop.
Filename sanitizer. Comments welcome.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# clean filenames for OS X, Windows and Linux | |
# this is for Ruby 1.8 and Rails 2.3 | |
# requires activesupport 2.3.x | |
class Filename | |
CHARACTER_FILTER = /[\x00-\x1F\/\\:\*\?\"<>\|]/u | |
UNICODE_WHITESPACE = /[[:space:]]+/u | |
def initialize(filename) | |
@raw = filename.to_s.freeze | |
end | |
# strip whitespace on beginning and end | |
# collapse intra-string whitespace into single spaces | |
def normalize | |
@normalized ||= @raw.mb_chars.strip.to_s.gsub(UNICODE_WHITESPACE,' ').mb_chars | |
end | |
# remove characters that aren't allowed cross-OS | |
def sanitize | |
@sanitized ||= normalize.to_s.gsub(CHARACTER_FILTER,'') | |
end | |
# normalize unicode string and cut off at 255 characters | |
def truncate | |
@truncated ||= sanitize.mb_chars.normalize.mb_chars.slice(0..254) | |
end | |
# convert back from multibyte string | |
def to_s | |
truncate.to_s | |
end | |
end |
Author
madrobby
commented
Feb 3, 2013
For a laugh I thought I'd strip out almost everything and with your basic example, it still worked.. and across Ruby 1.8, 1.9, and 2.0 and without the insidious ActiveSupport too! ;-) https://gist.github.com/41cc5c231fe280120034
I think I cheated :-D However, I have to shoot off now and there are surely trivial ways to break this (Ruby 1.8's handling of unicode regexes is not my forté), but might be worth a fiddle anyway if you ever want to run on production releases of MRI.
@peterc without the mb_chars the slicing won't work on Ruby 1.8
This is now a gem! https://github.com/madrobby/zaru
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment