Apr 13 2010

Moodle development traffic 14/2010

Latest stable version 1.9.8+

There are 7 commits into the stable branch from the last week. Tim Hunt committed David Binney’s patch fixing a question bank SQL query at Oracle (MDL-21828). Iñaki Arenaza fixed an old issue when he committed his patches that significantly increase the quality of images produced by TeX filter. If the binaries of latex, dvips and convert are available at the system, Moodle can generate nice PNG formulas. Otherwise, “ugly” GIFs are created by mimetex (MDL-10197). Another Iñaki’s committed patch allows to use HTML editor when editing login page instructions (MDL-19053) and yet another one fixes small typo in CAS, DB and LDAP authentication plugins (MDL-18689).

Unstable development version 2.0dev

There are 160 commits at the main development branch from the last week, many of them done by Petr Å koda during the weekend strings clean-up. Patrick Malley added a new test theme called Boxxie based on his Greenie theme for Moodle 1.9. On Wed Apr 7 00:31:20 2010 UTC, the last ever run of Eloy’s installer_builder generated and committed strings for Moodle installer and was turned off. In the near feature, it will be replaced with a new script taking the data from our new strings repository called AMOS. So long installer_builder, and thanks for all the strings!

Quotes of the week

“I like git because it might actually force us to fix our sloppy/missing review processes before things hit CVS”
Petr Å koda guards the code against eager commits – get alarmed!

“NOBUG: Bombing BOM: BOM bombed”
Eloy Lafuente and his bombastic commit message

Milliseconds count

At the moment, I have 3,740,067  records (yes, 3.74 millions of records) in the new database repository of Moodle strings. The repository (basically one huge SQL table) contains one record per every modification of every single string that happened since Moodle 1.6 in any language. It takes approximately two hours to re-populate the table from our git mirrors of Moodle core and Moodle langs CVS repositories. There are some critical queries that are used quite frequently – for example to take a snapshot of the most recent string values for the given component, language and branch. That is, for example, “what is currently saved in lang/en/moodle.php at the main development branch?” (or what was there at the given timestamp). Writing this query was a big lesson for me. Not only it was actually for the first time I ever used an SQL subselect. Having so many records, the query optimising became very important as I refused to wait dozens of seconds to actually see some results (and I can imagine translators  would not be happy with such delays, too).

Basically, all Moodle strings and their history is now stored in one database table containing fields branch (integer code for Moodle branch, like 1900 representing MOODLE_19_STABLE), lang (language code like ‘es’), component (the string domain – or the file where the string is defined, like ‘forum’), stringid (the key is $string array), text (the string itself) and finally timestamp (since when the change described by this record  is valid). There are other fields defined in the table – to hold the author of the change, commit message, deletion flag and others. There is composited index created for fields  (component, language, branch).

This is the query I was using during the optimisation and performance comparison. It returns the most recent version from several branches, several languages and several components – which is exactly what I need for the new translation interface.

SELECT r.id, r.stringid, r.text, r.timemodified, r.deleted
  FROM  mdl_amos_repository r
  JOIN (SELECT branch,lang,component,stringid,
               MAX(timemodified) AS timemodified
          FROM mdl_amos_repository
         WHERE branch IN (2000,1900)
               AND lang IN ('en','cs')
               AND component IN ('moodle','workshop')
             GROUP BY  branch,lang,component,stringid) j
    ON (r.branch = j.branch
        AND r.lang =  j.lang
        AND r.component = j.component
        AND r.stringid = j.stringid
        AND  r.timemodified = j.timemodified)
  WHERE r.branch IN (2000,1900)
        AND r.lang IN ('en','cs')
        AND r.component IN ('moodle','workshop')
  ORDER BY  r.component, r.stringid, r.lang, r.branch;

The basic idea of the query is to get know (in subselect) which one is the most recent version of every string. Then (via the table self join) get some additional information about such string. I was experimenting with various variants of the query -like putting those WHERE conditions into the inner subselect or the outer main select. None of them was good enough until Petr Å koda gave me a world-saving hint: put the same WHERE conditions into both inner subselect and outer main select! The inner conditions help to filter the rows that are going to be aggregated by GROUP BY. The outer conditions help to join results quickly.

Using this, I managed to get to times like 1500ms at my laptop and 200ms at the testing server, both using PostgreSQL. First attempts to run the same query on MySQL led to significantly worse figures. I am going to do more tests to be sure that both engines have similar resources/conditions available. That massive usage of IN() is supposed to be a challenge for MySQL…

Post scriptum

I fell in love running Moodle 2.0 upgrading script via command line interface – see admin/cli/*.php scripts





film streaming sur Megaupload