Diazo bug on HTML5 validation errors

We have been using Diazo for a long time for styling client sites. I believe the most efficient way of doing Plone theming is for sure to use Diazo, instead of just applying CSS and overriding templates. The clear advantages of using Diazo is:

  • Theme reusability (even outside Plone),
  • Theme developers does not require knowledge on Plone,
  • Possibility of using XSLT without extra modules,
  • Easy backend development (do the markup without thinking of theming).

I am sure there are even more advantages of using Diazo in favor of classic styling.

In the transition of using Diazo, we have of course experienced some problems. Just learning how Diazo rules works takes some effort, but some of the software is for sure not bugless. For example, we tried initially to do the transformation in Apache, but found out it was not really possible (I cannot really remember the details, something on e.g. the path condition did not work). Then we moved on to the a modified version of nginx that had extended support for e.g. path conditions. This worked out great 99% of the time, but we found wierd bugs like fuzzy rendering of the pages, and sometimes the XSLT transformation was not done at all, and it was hard to reproduce/debug.

So what we really found working was Plone’s built-in support for Diazo (plone.app.theming). This works great for sure, we run it on many production sites and it runs very stable. One day we got a bug in from our Q/A department, they had found on some of our customers site’s the HTML5 validation broke due to the usage of line feeds in the description field. The particular elements that broke the HTML5 validation was meta description and the title tag on the links in e.g. the navigation. The result of using line feeds in the description will result in a meta description tag like this:

<meta name="description" content="test test test&#13;&#10;test test test" />

Every time this page is referenced from the navigation portlet you will see the title tag contains the same. I tried tracing the escaping of the line feeds, and got the result that this was done in lxml. So what to do? In XHTML/HTML4 this is probably accepted behavior and will not break the validation, but lxml does not really know if we are dealing with HTML5 or XHTML/HTML4. I filed the bug (#13871), but unfortunately no response so far I am expecting that people are busy working on Plone 5. I see two (three) solutions for this problem:

  1. Ensure that the description fields does not allow line feeds. In case an editor puts in some text that contains line feed, just strip them out and maybe replace them by white spaces. We may need also to migrate previous contents.
  2. Fix the bug in lxml, this would involve C programming and will also be tricky due to the problem I earlier described on XHTML/HTML4 vs. HTML5.
  3. Using an ITransform adapter that removes the linefeeds, see my example here (which we are using currently).

I am sure the first one is the way to go. When I get some spare time I will try to see how it will work out on Dexterity-based content-types. What do you think? Feel free to comment on the bug (#13871).

5 thoughts on “Diazo bug on HTML5 validation errors”

  1. I’d just strip the line feeds when rendering the meta tag.
    the ITransform adapter example feels a bit performance critical, as it iterates over the whole html tree again.

    1. Hi Johannes. It’s not really sufficient, since the navigation portlet and the listings include the title tag (the description is used there), where the linefeeds are translated as well to so you need to traverse the whole tree. 🙁

      1. Ah, i you even mentioned this problem. i’ve overlooked that…

        Ok, maybe you need a event handler, which strips the description field on every modification. If we agree to have this in core, then this can instead be done by the behavior class.

        Related: I’ve done a addon which adds a richdescription field, which is used instead the normal description. It strips html tags on the description field, which is used on many places obviously.

        1. I think you are on the right track. Didn’t think of using an event, good catch.

          BTW Good to meet you in Cologne. 🙂

          1. don’t know, if it’s the best way, but event handler are very handy for this. but when doing invokeFactory, the event is not fired – disclaimer: if i remember correctly.

            yeah, was a great pleasure to meet in cologne for me too! see you around soon 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *