[updated: Jan 2nd, 2013 | Updated ‘tiny.pl’ to skip “mailto:” references … it polluted emails with replies]
[updated: Nov 28th, 2012 | Updated ‘tiny.pl’ to be much more efficient which resulted in a crazy speedup]
Yesterday I received one too many emails with a long URL that I actually needed to click on. Why is this a problem you wonder? — I use Mutt. Yes, I’ve heard it all, and yes, I don’t think it’s the best email client, but when you spend all day in a terminal, it’s simply a pain launching the browser to send a single email. It’s a “ctrl+a, #, m, type…type, y…sent”, vs “open browser, go to url, log-in, compose, type…type, hit send”. Anyway, given that I am using the Apple Terminal.app which for some reason has not been upgraded in the last 8 years to include hot linking URLs everywhere (correction: It does, but it does not always handle right clicking multi-line links which contain “strange characters and symbols”), I have to suffer. I’ve been toying with the idea of parsing my mutt emails for a while now, and yesterday I finally decided to sit down and write something. My starting point was my .mailcap entry
My first thought was, why not hook a custom perl script to parse the text from the email, extract the urls, and shorten them? After a little bit of work, I realized that I care about the rest of the text too and not just the URLs. The final solution can be found here:
The script starts by NOT reinventing the wheel, and utilizing lynx to parse out the HTML. Please note that this is done only for the ‘text/plain’ type. The way the same script is overloaded is by supplying a second argument for non-html based emails. As you will see, I actually use elinks to parse for html instead of lynx, and the reason for that is because lynx introduces a new character on long URLs for some reason, when used with –dump. This created problems with shortening the URLs. Then the script splits the resulting output into lines, and then it splits each line into “words”. Each “word” is checked for being an URL. If it is, and it’s less than the “trigger” number of characters, it’s shortened and printed, along with the original (nice to keep track). Otherwise, the word is just printed and the process repeats until the end.
While this is EXTREMELY simple concept wise, it is very useful. Is there a down side? — Yes — some potentially private URLs are now public. Solution? — Yes — sign up for bit.ly Pro (free) and use your own domain name. At last, I just want to tack on that while searching for an existing solution to this, I did find a program called “urlview”. I haven’t tried it, but it seems like a much better solution. Here’s some more information on it: http://linuxcommand.org/man_pages/urlview1.html
UPDATE: As it turns out, Terminal.app actually picks up some/most/(all?) long urls. I think it was ‘lynx’ that was wrapping the line at 65-72 characters, which ended up being the cause for the ‘+’ in front of long URLs and the break onto two lines. Anyway, so basically, this means that if you don’t use ‘lynx’ for the HTML parsing, you can potentially click on the links. Either way, I still prefer having tinyurls. Also, I did find a bug in the ‘t’ (non-html emails) version of the script, where in some cases, it will rip out the URL but now show it at all (original or shortened). I noticed this 2 times (out of 1000+ emails), but I just haven’t had time to look into it. I have a feeling that it’s not really my script. The email comes in not containing any html other than an html a href tag. I think that messes with the detection.
If you are using a terminal that makes URLs clickable, like the GNOME Terminal also used by Ubuntu, you can just pipe the email to grep. Add this to your .muttrc and press ctrl-b on an email:
set pipe_decode=yes
macro index,pager \cb “set wait_key\ngrep -iPh ‘https?://.+?\\s’\nunset wait_key\n” “Show URLs in message”
Nice. Much safer than sending them though a public shortener service too 🙂
I wrote this originally to deal with 2-3 line long urls while having mutt in a screen session, and I needed something quick that just worked. Unfortunately, the Apple terminal was very limited at the time and didn’t support clickable urls at the time, but stuff like iTerm does.
I’ve fixed up some encoding issues today, and also added a fix for urls inside -brackets. Maybe I’ll look deeper in to the regex someday, but I’m satisfied for the moment :).
Thanks!
Take a look at the updated script (just the perl portion) – I re-wrote it today and I think it is much better and faster now. It handles URLs better, it does it in near real time, and it displays them much better.
Glad you could use it. Something I’ve been meaning to fix/improve is the URL reg-ex parse. I’ve noticed that URLs have gotten a lot more “interesting”. Things that were never present are now starting to be semi-standard (ex: #, +, !, -, comma, etc…). If you have any interest/time, I would love to see if you can replace the URL portion with something better.
If you are using a terminal that makes URLs clickable, like the GNOME Terminal also used by Ubuntu, you can just pipe the email to grep. Add this to your .muttrc and press ctrl-b on an email:
set pipe_decode=yes
macro index,pager \cb “set wait_key\ngrep -iPh ‘https?://.+?\\s’\nunset wait_key\n” “Show URLs in message”
Nice. Much safer than sending them though a public shortener service too 🙂
I wrote this originally to deal with 2-3 line long urls while having mutt in a screen session, and I needed something quick that just worked. Unfortunately, the Apple terminal was very limited at the time and didn’t support clickable urls at the time, but stuff like iTerm does.
I’ve fixed up some encoding issues today, and also added a fix for urls inside -brackets. Maybe I’ll look deeper in to the regex someday, but I’m satisfied for the moment :).
Thanks!
Take a look at the updated script (just the perl portion) – I re-wrote it today and I think it is much better and faster now. It handles URLs better, it does it in near real time, and it displays them much better.
Nice :).
I’ve modified your script, so it will work with a local apache installation:
http://folk.uio.no/trondth/mutt/shortenurl.pl
Glad you could use it. Something I’ve been meaning to fix/improve is the URL reg-ex parse. I’ve noticed that URLs have gotten a lot more “interesting”. Things that were never present are now starting to be semi-standard (ex: #, +, !, -, comma, etc…). If you have any interest/time, I would love to see if you can replace the URL portion with something better.