Elasticsearch – getting autocompletion working

For a while the usual Python/Zope/Plone/Pyramid stack has been exchanged with Javascript/node.js/express/jade, this has been a very interesting transition, mainly struggling on getting used to node.js runs asynchronously whereas everything else runs synchronously. Also, it has been fun to learn to use the powerful tools npm, bower, grunt, and yeoman and the programming techniques like programming with promises.

In this particular project we had the need of a search engine to help us very *fast* to perform searches on different criteria. In Plone/Zope you tend to use the catalog for these queries but whenever you build a web application without that big a framework, you need something external to help you. The choices we had was solr or elasticsearch; solr is the old horse when comparing these two, the main problem with it when building applications like this is that it is schema-dependent. However, elasticsearch operates schema free and is self-adapting to the data provided by the application. Notice, we are not talking about a crawler here but just plain search engine functionality.

The idea on using the search engine is that the application will feed the search engine with data through a REST interface (particular for elastisearch) and queries are performed using the same REST interface. If you know nothing about elasticsearch please consult this tutorial or something similar. Getting the search running is rather trivial you have quite powerful ways of writing queries. But getting autosuggestion (so you can write, for example, “New” and “New York” would be a suggestion) is a different story. It took me a while to figure out (therefore I am writing this blog post)

The quick-and-dirty howto

The first thing you need is to add an additional field to your mapping, for example, let us call it “suggest”. So everything you store in this field will be proposed as an autosuggest. You can do it using the following command (substitute books with your index):

curl -XPUT 'localhost:9200/books/book/_mapping' -d '{
  "book" : {
    "properties" : {
      "suggest": {
        "type": "completion",
        "index_analyzer": "simple",
        "search_analyzer": "simple",
        "payloads": true
      }
    }
  }
}'

Now when you feed the search engine remember to provide suggest: { input: ['suggestion', 'another suggestion']}.

When performing the suggestion use the following query (see implementation here):

"suggest" : {
  "text" : "New",
  "completion" : {
    "field" : "suggest",
    "fuzzy" : {
      "fuzziness" : 1
    }
  }
}

Here we query for suggestions containing the text “New”, and we allow levenshtein distance 1, meaning that one letter may be incorrect positioned, so “Nwe” will match “New” as well.

That’s it! Feel free to write a comment if you struggled with this feature as well.

Setting up traceview for Plone

As I described in my previous blog post traceview can run on a stand-alone Plone site, i.e. without no front-end webserver. In this blog post I will described the required steps to get started on using traceview for Plone.

Step 1 – login to traceview

The first step is of course to login to the traceview interface, if you don’t have an account, sign up at AppNeta’s homepage. Apparently, they got a free plan if you just have one application, i.e. one Zope instance . After you have logged in go to “Get started” and select “Install host agent”. Then you will see the appropriate command you should run on the server. This will install the daemon that sends data to AppNeta, and also various C libraries.

Step 2 – set up your Python environment

You need the oboe module to use collective.traceview. It can be installed either via pip/easy-install. I recommend to use a virtual environment if you choose this approach. It can also be installed via buildout. The oboe egg is on pypi so it should not be any problem just adding it to the egg list. Also, remember to add collective.traceview to the egg list.

Step 3 – setting up environment variables

You need a series of environment variables to inform Plone that it should do tracing and also how it should do the actual tracing. Ideally, you should set these environment variables via buildout since you may have different profiles, e.g. test, production, etc. You should have a buildout configuration similar to this:

[instance]
...
environment-vars +=
    TRACEVIEW_PLONE_TRACING 1
    TRACEVIEW_IGNORE_EXTENSIONS js;css;png;jpeg;jpg;gif;pjpeg;x-png;pdf
    TRACEVIEW_IGNORE_FOUR_OH_FOUR 1
    TRACEVIEW_SAMPLE_RATE 0.3
    TRACEVIEW_TRACING_MODE always

Most of these variables are quite obvious, and each of them is explained in the README of collective.traceview. Basically, what the configuration says is, that Plone should trace (TRACEVIEW_PLONE_TRACING 1) and there should be done no tracing for js, css, etc files (TRACEVIEW_IGNORE_EXTENSIONS js;css;png;jpeg;jpg;gif;pjpeg;x-png;pdf). Furthermore 404 pages should not be traced (TRACEVIEW_IGNORE_FOUR_OH_FOUR 1), and we trace just 3 out of 10 requests (TRACEVIEW_SAMPLE_RATE 0.3).

Just run buildout now, and new traces should occur in the “Default app” on the traceview interface. Just so you can see how an example trace looks like, I have attached a screenshot below (each layer can be viewed and further information is provided, for example, for catalog queries the entire query is logged).

screenshot-traceview

Happy tracing!

 

Improved traceview support for Plone

Finally back for summer vacation, three weeks without a single line of code – how refreshing! It has been a long time since we did any work on collective.traceview but we have finally implemented a feature that has been wanted for long.

The current state of the module relies on a front-end web server (for example, apache or nginx) to kick the trace started. What happens if that the oboe module in apache will generate a unique trace id, referred to be the X-Trace, which will be sent to Plone and it will be used as reference to do the actual full-stack tracing. This is not always a good idea to do it in this way, consider the following scenarios:

  • You have no apache (or nginx), maybe just varnish in the front and it distributes the requests directly to the Plone instances, why wasting CPU power on running an additional web server.
  • You have apache, varnish and then Plone. You will get quite bogus traces for cached pages served directly by varnish, just showing apache.

For these scenarios it makes much more sense to start the tracing in the ZServer HTTP server. So we added an additional patch to the product patching the Medusa based implementation (I was not turned into stone for looking at it). This patch can start the tracing, giving us additional benefits than leaving out the front-end web server. It is now also possible to see the actual waiting time from the request hit ZServer to it is being served by the publisher. This may give you a hint if you need more ZServer threads or more Zope instances. For example, on the screenshot below, you can see a little waiting time around 9.00 AM between the ZServer HTTP server and the Zope publisher.

fivu_www_prod_Layer_Summary___TraceView

If you want to play with it just follow the instructions given in the README of collective.traceview to set the proper environment variables. These can preferably be set using buildout. The feature is provided by collective.traceview 1.3.

Please let me know if you have any feedback on the feature, or questions on how using it.

UPDATE: In github master is experimental Chameleon support, so you can see how much time that is spend on rendering individual templates; the file name of the template is logged, so it is easy debugging template renderings that seems to be a bottleneck.

Responsive tricks for Plone sites

We recently launched a major redesign/restructuring of one of our majors customer’s site. The main achievement has been the responsive improvements. In this blog post I will share some of the lessons we learned from this redesign. I believe these lessons are not just relevant for Plone, since they are sort of generic solutions.

Use LESS or SASS/Compas

We did the responsiveness for tablet/mobile using less. The styling become a major assignment and contains many lines of CSS styling for such a project. LESS/SASS provides a nice way of splitting up the CSS styling into different files, so you can create mobile.less, tablet.less, theme.less, etc. We use media queries for deducing what kind of device the user has so, for example, in mobile.less you have the media query in the beginning of the file. This gives a very nice overview and also reduces the probability of making mistakes.

Responsive videos

When the editor inserts a Youtube video on the site, it is likely to have a fixed size. What we would like is, on a mobile/tablet, whenever the user rotates the screen the size of the video should be changed, according to the new screen dimension. The following jQuery (ECMAscript) code can do this generically:

function resizeVideos() {
  $('iframe').each(function() {
    $(this).attr('width', $(this).parent().width());
    $(this).attr('height', $(this).parent().width() * aspectRatio);
  });
}

$(window).on("orientationchange", function(event) { resizeVideos(); });
$(window).on("load", function(event) { resizeVideos(); });
$(window).on("resize", function(event) { resizeVideos(); });

$(document).ready(function() {
  aspectRatio = $('iframe').height() / $('iframe').width();
});

Responsive tables

For our customers, it is quite normal to see pretty long tables; long with respect to number of columns, so tables similar to this:

Header 1 Header 2 Header 3 Header 4 Header 5 Header 6 Header 7 Header 8

A common design pattern is to do a vertical orientation of the table, so each row in the table will be a table in it self, for example.

Header 1
Header 2
Header 3
Header 4
Header 5
Header 6
Header 7
Header 8

On the web you may find a lot of ECMAscript code that is able to transform tables from a horizontal layout to a vertical layout, but so far I have not found any generic fix, so I decided to do it my self. The following ECMAscript code will transform tables marked by the responsivetable class.

$(document).ready(function() {
    headers = $('.responsivetable th').map(function() { return $(this).text(); });

    generated_css = '<style>';
    generated_css = generated_css + '@media only screen and (max-width: 760px) {';

    for(i=0; i < headers.length; ++i) {
        var j = i+1;
        generated_css = generated_css +
                        '.responsivetable td:nth-of-type(' +
                        j +
                        '):before { content: "' +
                        headers[i] +
                        '"; }';
    }

    generated_css = generated_css + '} </style>';

    $(generated_css).appendTo("head");
});

It requires a bit of CSS styling as well:

@media
only screen and (max-width: 760px)  {

    /* Force table to not be like tables anymore */
    .responsivetable table,
    .responsivetable thead,
    .responsivetable tbody,
    .responsivetable th,
    .responsivetable td,
    .responsivetable tr {
        display: block;
    }

    /* Hide table headers (but not display: none;, for accessibility) */
    .responsivetable thead tr {
        position: absolute;
        top: -9999px;
        left: -9999px;
    }

    .responsivetable tr { border: 1px solid #ccc; }

    .responsivetable td {
        /* Behave  like a "row" */
        border: none;
        border-bottom: 1px solid #eee;
        position: relative;
        padding-left: 40% !important;
    }

    .responsivetable td:before {
        /* Now like a table header */
        position: absolute;
        /* Top/left values mimic padding */
        top: 6px;
        left: 6px;
        width: 45%;
        padding-right: 10px;
        white-space: nowrap;
    }

}

 

Diazo bug on HTML5 validation errors

We have been using Diazo for a long time for styling client sites. I believe the most efficient way of doing Plone theming is for sure to use Diazo, instead of just applying CSS and overriding templates. The clear advantages of using Diazo is:

  • Theme reusability (even outside Plone),
  • Theme developers does not require knowledge on Plone,
  • Possibility of using XSLT without extra modules,
  • Easy backend development (do the markup without thinking of theming).

I am sure there are even more advantages of using Diazo in favor of classic styling.

In the transition of using Diazo, we have of course experienced some problems. Just learning how Diazo rules works takes some effort, but some of the software is for sure not bugless. For example, we tried initially to do the transformation in Apache, but found out it was not really possible (I cannot really remember the details, something on e.g. the path condition did not work). Then we moved on to the a modified version of nginx that had extended support for e.g. path conditions. This worked out great 99% of the time, but we found wierd bugs like fuzzy rendering of the pages, and sometimes the XSLT transformation was not done at all, and it was hard to reproduce/debug.

So what we really found working was Plone’s built-in support for Diazo (plone.app.theming). This works great for sure, we run it on many production sites and it runs very stable. One day we got a bug in from our Q/A department, they had found on some of our customers site’s the HTML5 validation broke due to the usage of line feeds in the description field. The particular elements that broke the HTML5 validation was meta description and the title tag on the links in e.g. the navigation. The result of using line feeds in the description will result in a meta description tag like this:

<meta name="description" content="test test test&#13;&#10;test test test" />

Every time this page is referenced from the navigation portlet you will see the title tag contains the same. I tried tracing the escaping of the line feeds, and got the result that this was done in lxml. So what to do? In XHTML/HTML4 this is probably accepted behavior and will not break the validation, but lxml does not really know if we are dealing with HTML5 or XHTML/HTML4. I filed the bug (#13871), but unfortunately no response so far I am expecting that people are busy working on Plone 5. I see two (three) solutions for this problem:

  1. Ensure that the description fields does not allow line feeds. In case an editor puts in some text that contains line feed, just strip them out and maybe replace them by white spaces. We may need also to migrate previous contents.
  2. Fix the bug in lxml, this would involve C programming and will also be tricky due to the problem I earlier described on XHTML/HTML4 vs. HTML5.
  3. Using an ITransform adapter that removes the linefeeds, see my example here (which we are using currently).

I am sure the first one is the way to go. When I get some spare time I will try to see how it will work out on Dexterity-based content-types. What do you think? Feel free to comment on the bug (#13871).