Why post-id may not be required
[RobertHahn] Ok. Well, it looks to me like we did a good job expanding on the particulars of the 'simple case', which I put forth as a reason to not make post-id required. I'm very happy for that. I'm not sure how this page should be refactored to reflect that this discussion has been summed up, and that all points have been made. I certainly have nothing more of value to add here, as it seems that everyone who commented on it is clear about the position I took. I understand the opposite argument presented by Grant and Joe too (good work, guys!) MishaDynin's analogy is probably a good one, but I know less about the mail spec than the HTML specs. I leave it for the gentle reader to decide.
I'm leaving my vote as 'required when conditions met', because I still think that's the best way to do it. But I think that this is a good page to look at if you're not quite sure why it must be required.
I feel that post-id should be optional. I will grant that there are many cases where including a post-id would make great sense and many of these cases have been touched upon in Sam's Linkage article and summarized nicely by JoeGregorio
here. I will probably be needing post-id in my own Echo feeds (assuming a product comes out of this), but the one case I can't see the justification for it is for the ultra-simple, possibly internally hosted site that only needs one Echo feed, and has a 1:1 mapping of perma-link to representation (as opposed to the 3:1 that MT has)
As I understand this, requiring post-id for all Echo feeds is akin to requiring that all HTML pages must use CSS - which obviously isn't true at all.
If the post-id becomes required, then in the ultra-simple case proposed above, the person constructing the feed would either duplicate the perma-link contents, or worse, make up stuff, or leave it as an empty string. This becomes a code smell on the markup level.
But, but but.... what if the guy who constructed that Echo feed needs to 'ramp up' to a beefier logging tool, or change hosts, which may break the existing perma-links? That's a good question. My suggestion is that they use XSLT (err - assuming we're using an XML friendly syntax) to transform the Echo feed so that post-id's are generated. Perhaps they could take the value of the perma-link as a source value.
What about implementation? While I haven't seen the code for the existing heavyweight blogging tools (unless you count Blosxom as one of them), I would imagine that in the logic, you look for a post-id (if you need such a thing) and if it doesn't exist, copy the value of perma-link to post-id (for internal purposes - don't also update the source code) and continue processing as usual. The same logic would be applied to the development of aggregators.
Am I off base on this understanding? I haven't seen anyone explain why post-id is required even the simplest case. But, just as there's a huge number of web pages using CSS, there'll also be a huge number of Echo feeds using post-id. And just as there are simple web pages that don't need or want CSS, I posit that there would be Echo feeds that don't need post-id either.
[JoeGregorio] I've gone on in considerable length why I believe we need both a required post-id and a required perma-link, hopefully answering all of your concerns.
[RobertHahn] That is an excellent summary, and I agree with all your points. However, your explanation consistently avoids the simplest case: What if a post never crosses categories? What if the system utilized never has multiple archiving contexts? What if you're not using (building) a CMS that needs a post-id?
Perhaps the only edge case left in your discussion is, and I quote: "... this will allow the aggregator builders to track posts and allow the end-user to control whether they see the same item if it appears in multiple contexts." Which is an interesting case. If the source for a given perma-link is the guy who set up the bare-bones log, and someone else wants to merge it into their echo feed, it would seem to me that such a thing must happen only after there's some dialogue, as it would be rude to simply 'lift' a log entry and put it in another feed without asking permission first.
In all those cases, yes, times would change, and the author of a bare-bones feed may need the things you describe. But in all those cases, if they happen at all, then they must be noticed by the author of the feed, and author intervention would be required.. Again, to go back to the CSS example: It would be a smart thing for someone to style their HTML with CSS 'now', but if they don't want to, the quality and validity of their content isn't sacrificed.
[JoeGregorio] Robert, I did address the simple case by saying, "if the posts from your CMS only have one URL then just set post-id = perma-link. Sure it's a little redundant, but it's easy to implement."
[BenMeadowcroft] Joe, your implementation implies the use of a CMS, I do not use a CMS other than some text editing macros set up in my favourite text editor. On the CMS side setting "post-id = perma-link" is fairly trivial, however why not move this requirement from the CMS onto the aggregator. While we cannot be sure that the author of an Echo/Pie feed uses a CMS we can be sure that the aggregator is a program of some sort (unless you know any normal people who read raw RSS feeds for their news). Given that we cannot assume a CMS is used (I don't currently use one for example) but we can more strongly assert that an aggregator is used we should move the "post-id = perma-link" phase in the aggregator. post-id should not be required, it's functionality can be implemented by the aggregator if the post-id is not present, having the requirement to have it is too strong in my opinion.
[RobertHahn] Joe: I'm concerned that the solution you proposed would be confusing to people new at authoring Echo feeds. As syndication begins to move to the masses, people less smarter than us, or have less time to appreciate these subtlties would be oddly tripped up by this requirement, particularily if they're only proverbially testing the waters with their toe. I understand it, you understand it, and obviously there's a raft of other people voting that post-id be required. I'm making a case for the guy who hasn't seen this stuff yet, but will in a year, after this hand-wringing and angst becomes a thing of the past. As it stands, Echo looks *simple*. This requirement for post-id, from the perspective of a newcomer, is strange and unexpected, and does not follow the Principle of Least Surprise.
[GrantCarpenter, RefactorOk] It sounds like you're both agreeing about everything except where the onus for defaulting post-id == perma-link resides, on enlightened clients or as a baseline part of generating a valid feed. To this end, I'm leaning towards it seeming more likely to break down if it's implicit (comsumer) rather than explicit (producer). With consumer's implicitly filling in the blanks it's an opaque process, in many cases you'll only be able to infer if a given client agent is properly doing so--with explicitly requiring a feed to include a post-id (and possibly perform the duplication itself), it seems fairly transparent. A quick view source of the feed or a validation against a schema/dtd/what-have-you will verify whether or not it's on spec. Personally scenarios where feeds are being hand generated and this amounts to needless tedium seem like edge cases, nearly to the extent that someone reading a raw feed does. We're talking about Echo having an excruciatingly specific spec, tools to get definitive validition from, I don't know that simply requiring post-id puts this that much further out of the reach of Jane Sixer. I might even argue that the lack of specificity about what is good and bad RSS pool is far more challenging with the various flavors of RSS today.
One thing that does concern me is how realistic requiring post-ids to be globally unique is. Sure, most everyone reading here can generate some valid form of guid in the blink of an eye, but how does this play out in the wild where generating valid xml for an rss feed seems to be touch and go? I could just be overly worrisome here.
[RobertHahn] Grant, your assessment of the discussion on this page is exactly correct.
At one point, you said, "Personally scenarios where feeds are being hand generated and this amounts to needless tedium seem like edge cases..." Assuming you did so, when you learned HTML, did you do it by running a WYSIWYG client, or by view source? I hail from the bad old days of web design (Netscape 1.1N was new at the time), and almost all my education came from viewing source. Not to sound like I'm kissing Tim Bray's butt, I think he's right to drive home the point that View-Source is an extremely important tenet to widespread, grassroots adoption. If Jane Sixer was just trying Echo out, and gets invalid feeds because she dosen't need or understand the purpose for post-id, then that's a barrier to Echo's acceptance.
I think I would be a lot happier (and it seems you would too) if I would know exactly what kind of data value should go in there - especially for the 'edge case' that I'm proposing - if you don't need post-id, what should go there? Specify this up front, so that even if Jane Sixer doesn't understand *why* post-id is required, at least she knows she's doing it right by following these steps by rote.
[GrantCarpenter] I do agree with you and Tim Bray (and I believe Sam indicated a fair amount of agreement) about view source. At least for HTML. And yes, that's how I and probably most others learned HTML back in the day. With RSS I actually went to the specs first, found them over time to be vague or incomplete at turns (2.0) and then reverted to augmenting my understanding by looking at other people's feeds. But let's say I agree with you that view source is a relevant path to learning this type of standard's ins and out initially (I do). Just the same, everybody and their uncle writes aggregators, and I'd wager there's a fair amount of viewing source going on there as well. So I'm not sure that this, in and of itself, makes one approach that much better than the other. In any event, there's no confusion on my end that this is a subjective item, I just think my conclusion that requiring both elements is a function of my desire for the spec to be as explicit as possible where it can be and still not start drifting too far off towards the barriers to entry for RSS 1.0 or somesuch.
[MishaDynin] I disagree that post-id is the same as CSS in HTML -- it's more like a Message-ID in a mail message. Can you survive without it? Yes. But life's easier when it's always present.
[JoeGregorio] As for the "view-source" argument, if you see both tags in every Echo feed then you can be sure you'll include them in your own feed. If they're optional you might not see a feed with both in there and miss out on the advantages.