Python Developer and Educator
2015-03-09
The Framework: Django 1.3 (not my choice!)
The Mission: Create an RSS feed that includes an extra field for full story content, plus a few additional fields for images
I know. It's 2015 and we're still on 1.3. But the changes to Django syndication since 1.3 haven't been that dramatic, so you might find this useful if you're in the same boat.
Let's start with the methods needed to add some custom content to a <content:encoded> element.
The Django feed library comes with a set of standard elements for which you must define the content: <title>, <link>, and <description> for the feed, and then of course <title>, <link>, and <description> for individual feed items.
In our use case, we have a feed containing a list of news stories. We're already sending a truncated version of each story's content to <description>, but we want to add an additional field - <content:encoded> - to return the story's full content.
To add an additional element (or two or three), there are a few places you'll need to update - two (possibly three) standard feed methods and whatever custom method(s) you need to populate the new elements.
In this code sample, follow the trail from item_extra_kwargs() to item_your_custom_field() to add_item_elements().
from django.contrib.syndication.views import Feed from django.utils.feedgenerator import Rss201rev2Feed class ExtendedRSSFeed(Rss201rev2Feed): """ Create a type of RSS feed that has content:encoded elements. """ def root_attributes(self): attrs = super(ExtendedRSSFeed, self).root_attributes() # Because I'm adding a <content:encoded> field, I first need to declare # the content namespace. For more information on how this works, check # out: http://validator.w3.org/feed/docs/howto/declare_namespaces.html attrs['xmlns:content'] = 'http://purl.org/rss/1.0/modules/content/' return attrs def add_item_elements(self, handler, item): super(ExtendedRSSFeed, self).add_item_elements(handler, item) # 'content_encoded' is added to the item below, in item_extra_kwargs() # It's populated in item_your_custom_field(). Here we're creating # the <content:encoded> element and adding it to our feed xml if item['content_encoded'] is not None: handler.addQuickElement(u'content_encoded', item['content_encoded']) ... class YourFeed(Feed): feed_type = ExtendedRSSFeed .... def item_extra_kwargs(self, item): # This is probably the first place you'll add a reference to the new # content. Start by superclassing the method, then append your # extra field and call the method you'll use to populate it. extra = super(YourFeed, self).item_extra_kwargs(item) extra.update({'content_encoded': self.item_your_custom_field(item)}) return extra def item_your_custom_field(self, item): # This is your custom method for populating the field. # Name it whatever you want, so long as it matches what # you're calling from item_extra_kwargs(). # What you do here is entirely dependent on what your # system looks like. I'm using a simple queryset example, # but this is not to be taken literally. obj_id = item['my_item_id'] query_obj = MyStoryModel.objects.get(pk=obj_id) full_text = query_obj['full_story_content'] return full_text
This generates a feed that looks something like this:
My actual use case called for me to extend from a feed that already existed, leaving that original feed intact and only including the new element in the new feed. Here's how you'd do that:
class YourFeed(Feed): feed_type = ExtendedRSSFeed .... def item_extra_kwargs(self, item): extra = super(YourFeed, self).item_extra_kwargs(item) extra.update({'content_encoded': self.item_your_custom_field(item)}) return extra def item_your_custom_field(self, item): return None class YourNewFeed(YourFeed): def item_your_custom_field(self, item): ... return full_text
So in the original feed, 'content_encoded' comes back as None and <content:encoded> never appears. It is only generated for the new feed.
The customer requesting this new feed actually asked for html wrapped in a CDATA section. I never did figure out how to do that with the Django syndicator alone - the CDATA tag always came out encoded, there didn't seem to be any way around that. And every blog post I found lead me back to this old bug ticket - https://code.djangoproject.com/ticket/15936 - which suggests just ditching the CDATA section and letting Django handle the encoding. I tried that, but it didn't pass the W3C feed validator - more on that later.
One of the things we tried along the way was a custom template for the CDATA content. That didn't work for creating a CDATA section, as ultimately there was no way to prevent the tag from being encoded. But I didn't find many clear posts about how to do this so I thought I'd share an outline of the attempt here:
from django.template import loader, Context, TemplateDoesNotExist ... def item_extra_kwargs(self, item): extra = super(ListDetailRSS, self).item_extra_kwargs(item) # Define a template - give it any name, the one below is just an example. # The path will obviously depend on your settings. content_encoded_template = 'feeds/list_detail_content_encoded.html' try: # Use the Django template loader to get the template content_encoded_tmp = loader.get_template(content_encoded_template) # Set the field value as template context content_encoded = content_encoded_tmp.render( Context({'myobj': self.item_your_custom_field(item)})) # Then update your extra kwargs with the rendered template # instead of the original value returned from your custom method extra.update({'content_encoded': content_encoded}) except TemplateDoesNotExist: # And if you don't have a template, just use the content as # returned from your custom method extra.update({'content_encoded': self.item_your_custom_field(item)}) return extra
Your template can be as simple as this:
{{ myobj }}
This can be useful if you want to customize your value by wrapping some text around it or maybe apply template filters before it's rendered.
Our new <content:encoded> element is supposed to have a few other fields inside it. What I'm showing you here ultimately didn't work for us (see the encoding section below), but I did learn a thing or two about how to wrap elements inside other elements in ways that aren't covered in the Django documentation (I should get on adding that, right?).
def add_item_elements(self, handler, item): super(ExtendedRSSFeed, self).add_item_elements(handler, item) if item['content_encoded'] is not None: # <content:encoded> is going to wrap around some other elements, # so instead of using handler.addQuickElement() we're going to # use startElement() (and then end it later) handler.startElement(u"content:encoded", {}) # handler.characters() fills in content between the tags, e.g.: # <content:encoded>This is where the content goes.</content:encoded> handler.characters(item['content_encoded']) # And close the element, ba-bam. handler.endElement(u"content:encoded")
If you wanted to apply attributes to the element itself, that empty dict you set at startElement() would look like this instead:
if item['content_encoded'] is not None: handler.startElement(u"content:encoded", {'my-attribute': 'my-value'})
And here's the wrapping around other elements bit:
if item['content_encoded'] is not None: handler.startElement(u"content:encoded", {}) handler.characters(item['content_encoded']) # Suppose we have a photo to go along with this story if item['media'] is not None: handler.startElement(u'figure', {'type': 'image/jpeg'}) handler.startElement(u'image', { 'src': item['media']['src'], 'caption': item['media']['caption'] }) handler.endElement(u'image') handler.endElement(u'figure') handler.endElement(u"content:encoded")
Back to that old bug ticket. We ultimately decided to follow the sage advice to forget about CDATA, even though the suggested code didn't work exactly as described (whether that's because of our old version of Django, or our version customizations, I don't know, but I never had time to research it).
Instead, we had to ... double encode? Or rather, escape, then let Django encoding do its thing.
After all that work to wrap elements one inside the other, our feed still wasn't validating. So instead of creating them as elements, we just converted the tags to strings:
if item['content_encoded'] is not None: handler.startElement(u"content:encoded", {}) handler.characters(item['content_encoded']) if item['media'] is not None: figure = '<figure type="image/jpeg">' figure += '<image src="%s" caption="%"></image>' % \ (item['media']['src'], item['media']['caption']) figure += '</image></figure>' # Don't forget to stick that string in the middle of # the <content:encoded> element: handler.characters(figure) handler.endElement(u"content:encoded")
Ugly, yes, but it almost worked. At least it failed in a different way.
After some trial and error, I found that ultimately I had to do some xml-specific escaping. I wound up using a method out of SAX utilities, applied it to the story content as it was being returned from my custom method, and also to the string for that <figure> tag inside <content:encoded>.
from xml.sax.saxutils import escape ... def add_item_elements(self, handler, item): ... if item['content_encoded'] is not None: handler.startElement(u"content:encoded", {}) handler.characters(item['content_encoded']) # Suppose we have a photo to go along with this story if item['media'] is not None: figure = '<figure type="image/jpeg">' ... handler.characters(escape(figure)) handler.endElement(u"content:encoded") ... class YourNewFeed(YourFeed): def item_your_custom_field(self, item): ... return escape(full_text)
What that returns looks slightly uglier. But guess what? It validates.
from xml.sax.saxutils import escape from django.contrib.syndication.views import Feed from django.utils.feedgenerator import Rss201rev2Feed class ExtendedRSSFeed(Rss201rev2Feed): def root_attributes(self): attrs = super(ExtendedRSSFeed, self).root_attributes() attrs['xmlns:content'] = 'http://purl.org/rss/1.0/modules/content/' return attrs def add_item_elements(self, handler, item): super(ExtendedRSSFeed, self).add_item_elements(handler, item) if item['content_encoded'] is not None: handler.startElement(u"content:encoded", {}) handler.characters(item['content_encoded']) if item['media'] is not None: figure = '<figure type="image/jpeg">' figure += '<image src="%s" caption="%"></image>' % \ (item['media']['src'], item['media']['caption']) figure += '</image></figure>' handler.characters(escape(figure)) handler.endElement(u"content:encoded") ... class YourFeed(Feed): feed_type = ExtendedRSSFeed .... def item_extra_kwargs(self, item): extra = super(YourFeed, self).item_extra_kwargs(item) extra.update({'content_encoded': self.item_your_custom_field(item)}) extra.update({'media': self.item_your_custom_media_field(item)}) return extra def item_your_custom_field(self, item): return None def item_your_custom_media_field(self, item): return None class YourNewFeed(YourFeed): def item_your_custom_field(self, item): obj_id = item['my_item_id'] query_obj = MyStoryModel.objects.get(pk=obj_id) full_text = query_obj['full_story_content'] return escape(full_text) def item_your_custom_media_field(self, item): obj_id = item['my_item_id'] query_obj = MyStoryModel.objects.get(pk=obj_id) photo = query_obj['photo']['url'] caption = query_obj['photo']['caption'] return {'src': photo, 'caption': caption}
Contact: barbara@mechanicalgirl.com