Archive for the ‘feedbag’ tag
Feedbag 0.6
I just uploaded Feedbag 0.6 to Gemcutter and GitHub.
Just a couple of small nice additions to this version:
- The undocumented args[:narrow] option has been disabled until further notice.
- A nice little commit from one of Feedbag's forks, by Patrick Reagan.
- Added an executable to find feed URLs directly:
Sometimes you need to find the feed for a URL quickly, not from a script. What I do, and what someone else showed me too, is this:
~ $ irb -- require "rubygems" = true -- require "feedbag" = true -- Feedbag.find "http://stereonaut.net" = ["http://stereonaut.net/feed", "http://stereonaut.net/tag/feed/", "http://stereonaut.net/comments/feed/"] --
But now you can simply do:
~ $ feedbag cnn.com http://twitter.com/compupaisa == cnn.com: - http://rss.cnn.com/rss/cnn_topstories.rss - http://rss.cnn.com/rss/cnn_latest.rss == http://twitter.com/compupaisa: - http://twitter.com/statuses/user_timeline/119479806.rss - http://twitter.com/favorites/119479806.rss ~ $
Enjoy the feedbag executable on your $PATH now!
Feedbag significantly faster than Rfeedfinder
Alright so Feedbag looks to be significantly faster than Rfeedfinder when tested against different URIs:
user system total real
log.damog.net:
feedbag
0.280000 0.050000 0.330000 ( 1.486712)
rfeedfinder
0.140000 0.030000 0.170000 ( 4.259612)
http://cnn.com:
feedbag
0.200000 0.020000 0.220000 ( 0.703625)
rfeedfinder
0.240000 0.030000 0.270000 ( 1.071508)
scripting.com:
feedbag
0.170000 0.030000 0.200000 ( 0.682292)
rfeedfinder
0.220000 0.040000 0.260000 ( 1.668234)
mx.planetalinux.org:
feedbag
0.550000 0.050000 0.600000 ( 1.636884)
rfeedfinder
0.760000 0.120000 0.880000 ( 3.189143)
http://feedproxy.google.com/UniversoPlanetaLinux:
feedbag
0.030000 0.010000 0.040000 ( 0.696871)
rfeedfinder
0.160000 0.030000 0.190000 ( 1.613874)
As I had previously blogged, Feedbag also can use feedvalidator to get most accurate results. The results above were tested with feedvalidator deactivated, which is the default behavior anyway. When activating it, the following results are seen:
user system total real
log.damog.net:
feedbag
0.390000 0.070000 0.460000 ( 3.434350)
rfeedfinder
0.170000 0.030000 0.200000 ( 2.819837)
http://cnn.com:
feedbag
Feed looked like feed but might not have passed validation or timed out
0.200000 0.020000 0.220000 ( 1.103810)
rfeedfinder
0.200000 0.030000 0.230000 ( 1.036161)
scripting.com:
feedbag
0.220000 0.030000 0.250000 ( 1.282081)
rfeedfinder
0.150000 0.040000 0.190000 ( 1.520435)
mx.planetalinux.org:
feedbag
0.660000 0.050000 0.710000 ( 2.784598)
rfeedfinder
0.760000 0.110000 0.870000 ( 3.984222)
http://feedproxy.google.com/UniversoPlanetaLinux:
feedbag
0.050000 0.010000 0.060000 ( 1.275603)
rfeedfinder
0.170000 0.030000 0.200000 ( 2.067279)
So Rfeedfinder appears to be slightly faster on small pages but even slower than Feedbag with big ones (even when Feedbag calls feedvalidator which makes it to make the request twice!). Also, it's noticeable that Feedbag will return more significant results than Rfeedfinder:
>> Feedbag.find "http://log.damog.net"
=> ["http://feeds.feedburner.com/InfinitePigTheorem", "http://log.damog.net/category/feed/", "http://log.damog.net/tag/feed/", "http://github.com/damog/rfeed", "http://log.damog.net/tag/rfeed/", "http://log.damog.net/comments/feed/"]
>> require "rfeedfinder"
=> true
>> Rfeedfinder.feed "http://log.damog.net"
=> "http://feedproxy.feedburner.com/InfinitePigTheorem"
After this, it really makes more sense to use Feedbag than Rfeedfinder.
The benchmark code can be found here. As I wrote the bechmark test, I did put the Feedbag requests first in order to make it less likely to have better results for Feedbag for a possible cache favoring it, even then, Feedbag was superior.
Feedbag now using feedvalidator
There's a very special case that I hadn't spotted on Feedbag. Within the different methods that Feedbag uses to discover the feed on a given URL, the very first one is lookup on a table of "known" content types. If the alleged feed is served with any of the following content types, then Feedbag just returns that same URL as it assumes it's the feed:
@content_types = [
'application/x.atom+xml',
'application/atom+xml',
'application/xml',
'text/xml',
'application/rss+xml',
'application/rdf+xml',
]
However, what happens if the feed is not served with any of those but it's a valid feed? Well, Feedbag wouldn't auto-discover the feed itself but would start parsing the HTML, which is time-consuming (and unneeded after all). Because of this, between the content type lookup and the HTML parsing, I've added W3C feed validation using the nice gem feedvalidator. However, since this would result on an extra dependency, I've left it as optional. If the gem is available, it'll use it, otherwise, it won't and will start parsing the HTML.
You can see the fix patch on this commit.
Introducing Feedbag: Feed auto-discovery Ruby library/tool
Last week, I spent some time building a good (that I liked) feed auto-discovery tool to use in Ruby for other project I'm building, rFeed. I liked CPAN's Feed::Find, and at some point I made a wrapper class to run a Perl script using such module, however, I wasn't happy by mixing it all. So, Feedbag was born:
>> require "rubygems" => true >> require "feedbag" => true >> Feedbag.find "log.damog.net" => ["http://feeds.feedburner.com/TeoremaDelCerdoInfinito", "http://log.damog.net/comments/feed/"]>> planet_feeds = Feedbag.find("planet.debian.org") [ ... ] >> planet_feeds.first(3) => ["http://planet.debian.org/rss10.xml", "http://planet.debian.org/rss20.xml", "http://planet.debian.org/atom.xml"] >>planet_feeds.size => 104 >>
It makes smart use of relative and absolute bases, hrefs, links, content types, etc. It is also a single Ruby file, so you can grab it and use it on your application. Plus, it only requires Hpricot as dependency. It can find all feeds linked on a web page, but it will return the most important at the beginning of the resulting array, so you will have the important one on the first results (see example above with Planet Debian).
Synopsis, README and a brief tutorial have been placed at axiombox.com/feedbag. You can also take a look at the git repo, hosted in GitHub.



