Oct 8 2010

Moodle development traffic 39/2010

Latest stable version 1.9.9+

There are 3 commits into the stable branch from the last development week (Tue 28 to Mon 4). ♦ Andrew Zoltay spotted a bug in login form on sites with “Use HTTPS for logins” setting on and provided a patch, included by Petr Skoda (MDL-24225). ♦ Two other commits are just trivial code cleanups. Rossiani Wijaya removed whitespace in the code she committed week before. ♦ Andrew Davis fixed a table comment in one of XMLDB files, spotted by Eloy Lafuente.

Moodle 2.0 RC1

There are 106 commits into the future release branch from the last week. The main community site http://moodle.org has been upgraded to 2.0 engine during the weekend and helped the core developers to discover some forgotten bugs and incompatible customizations.

Quote of the week

“Raarrrrrrrr fixed way old bug from dml conversion”
Sam Hemelryk really enjoys bug fixing rampage

Parsing uploaded string files in AMOS

I was working on a new AMOS feature that allows users (language pack maintainers and translation contributors) to upload their translation to AMOS and include it in the main repository or offer it for inclusion. The first supported format of such file is standard Moodle string file format, which is valid PHP code defining associative array $string. The problem was that for obvious security reasons, I can not just let anonymous users to upload whatever PHP code and execute it. Imagine what would happen if hacker came and uploaded a file like

<?php
global $CFG;
$string['something'] = $CFG->dbpass;

If I just executed such file via include(), the user would get access to sensitive server configuration data. So it was clear I have to write my own parser that extracts the array from the file without actually executing the file.

I initially tried to solve it by searching for some patterns in the file contents using regular expressions. That worked pretty well in simple cases but unit tests (of course I started with them) failed quickly for complex samples that included commented lines, block comments or strings that contained the interesting patterns themselves.

<?php
// $string['lalala'] = 'Knock knock';
/*
$string['grrrr'] = 'Who\'s there?';
*/
$string['nasty'] = '$string[\'nasty\'] = \'Funny heh?\';';

So I realized regexps are not suitable for this kind of task. The other approach how to deal with comment blocks would be stream processing of the file contents, with a lot of flags like “inside a comment”, “after line comment mark” or “waiting for variable name”. But that was evident reinventing of the wheel which would have a square shape at the end of the day, anyway. PHP itself has to do this boring job when analysing the source so I just had to learn how to use its results.

Wise may already know I ended with tokenizer extension. The tokenizer functions provide an interface to the PHP tokenizer embedded in the Zend Engine. Using these functions you may write your own PHP source analyzing or modification tools without having to deal with the language specification at the lexical level. The parser method I have finally implemented calls token_get_all() to get array of tokens found in the uploaded file and picks patterns that are considered as valid string definition.

For the purpose of uploading the string file into AMOS, valid definition is something like ‘T_VARIABLE $string followed by [ followed by T_CONSTANT_ENCAPSED_STRING followed by ] followed by assignment = followed by T_CONSTANT_ENCAPSED_STRING followed by semicolon’. All other tokens like T_WHITESPACE, T_INLINE_HTML, T_COMMENT or T_DOC_COMMENT are just ignored.

As you can see this means that code like

$string['greeting'] = 'Hello ' . 'world';

is considered as syntax error for AMOS import even though it is valid PHP
code. But I am sure it OK as there is no real reason to support all thinkable
ways of string definiton (like heredoc etc).


Aug 27 2010

Moodle development traffic 33/2010

Latest stable version 1.9.9+

There is just one commit into the stable branch from the last development week (from Tue Aug 17 to Mon Aug 23). Sam Marshall fixed a bug in a library that handles displaying of side blocks. The bug caused that a column space was reserved on the left or right side of the screen if there was an instance of a block even if that type of block was disabled at the given site (MDL-23871).

Future version Moodle 2.0 Preview 4

There are 85 commits into the main development branch from the last week. The branch is very near to the feature freeze point and testers are preparing for the second round of Moodle 2.0 QA testing, which will start together with the first release candidate.
Petr Å koda redesigned the concept of the internal constant CLI_SCRIPT. Until now, this constant has behaved as an autodetection result of whether the script is run via web or command line interface. It used to be used in if-statements to produce HTML output for browsers and plain text output for command line, for example. This issue is now handled properly by the output rendering mechanism and the concept of CLI_SCRIPT is now different. Developer uses this constant to explicitly declare that a script is supposed to be run via CLI only – by calling define('CLI_SCRIPT', true); before including main config.php. And the autodetection in setup.php just makes sure that such a script is really called via CLI. And vice-versa, it is not possible to run a script via CLI if it does not declare CLI_SCRIPT explicitly. This step makes the logic of CLI_SCRIPT constant similar to AJAX_SCRIPT and prevents accidental execution of CLI scripts via web and web scripts via CLI. Therefore it was needed to prepare a CLI version of /admin/cron.php file. So if you run cron.php via CLI, you must execute /admin/cli/cron.php script now. See MDL-23824 for details.

Quotes of the week

| .
Jordan Tomkinson plays pong game with Penny Leach via Jabber chat room

. |
Penny Leach strikes the ball back to Jordan

“David is going to surely put this in his blog”
Penny Leach was right

Mr Moodle, let me introduce you Mrs Statistics

Moodle 2.0 File API uses kind of content-addressable storage to keep course and user files on the server disk. Shortly said, every file is saved into the file pool with a filename that is calculated as SHA1 hash of the file content. If a file is copied (for example when the teacher is cloning the course), there is no need (and actually no way) to duplicate the file stored physically on the disk – just a new record in a special table of files is created.
All physical files are stored in so called file pool – a directory in moodledata. But as almost every file system has some limits in the number of files/sub-directories per each directory, it is necessary to distribute the files into sub-directories in the filepool. And here it comes interesting. The initial implementation of the file pool presented three levels of sub-directories to store each file. So if the SHA1 hash of the file content was e.g. 10a68843c08fe4446839961153812b94a1983c6b, such a file would be store in $CFG->dataroot/filedir/10/a6/88/10a68843c08fe4446839961153812b94a1983c6b
Ashley Holman from NetSpot realized that this distribution of files into three levels of sub-directories in the file pool is quite an overkill. If we expect that SHA1 hash has randomly distributed bit values, the chance of even having more than one file in the sub-directory seems to be around 1/16 millions, so it is pretty unlikely that even two files share the same directory at majority of Moodle sites. So we probably use four file descriptors (three directories plus one normal file) at the harddisk filesystem to keep a single file. That was considered as wasting of OS resources and the issue was filed as MDL-23885.
Eloy Lafuente mentioned a system he was working on in the past, which did not use fixed number of directory levels but was internally scaling itself as needed, so it would use just one directory level on small sites with several dozens of files and more of them on bigger sites with zillions of files. “I can say it has worked perfectly, scaling from 0 to 10 millions of files (today), with all the directories being filled as the system grows along the last 8 years (without waste of directory descriptions),” shared Eloy his experience.
Tim Hunt from the Open University was asked to come with a statistical analysis of the problem: “For N files (where N is an estimate of the most uploaded files any Moodle will ever have), come up with a directory structure for SHA1 hashes, so most folders contain ~1.000 files or sub-directories, and no sub-directory ever goes above 32.000 files (or at least the chance of that is vanishingly small).” Tim posted a nice report of the results and it was decided that there will be just two levels with 8 bits each to group the files in the file pool. Must be said that the structure of the file pool is internal implementation detail of File API layer and no-one is ever supposed to access it directly. Therefore it was not difficult to modify the code according the results of Tim’s research.
It is nice when software development is not just about the programming language.

Post scriptum

var_dump('php_sucks' == 0);


Jun 18 2010

Moodle development traffic 23/2010

Latest stable version 1.9.9+

There were 5 commits into the stable branch in the last development week (from Tuesday Jun 8 to Monday Jun 14). Martin Dougiamas bumped the version to 1.9.9 and fixed a potential memory overflow problem occurring during the activity import when a teacher is enrolled in many courses (MDL-19880). This fix caused a regression, spotted and patched by Alan Trick and committed by Eloy Lafuente (MDL-22740). Tim Hunt committed patch provided by Vadim Dvorovenko, fixing a typo causing breakage of the questions restore process (MDL-22720). Gordon Bateson committed a patch submitted by Ramon Eixarch, fixing questions import problem in case of Hotpot format JMatch and JMix (MDL-22726).
Security announcements for Moodle 1.9.9 were published yesterday at our MSA page. Moodle 1.9.9 fixes four security problems, two of them are considered critical and one of them major. Registered administrators were notified and encouraged to upgrade their sites before the detailed description of these issues was published, see Moodle security procedures for details.

Future version Moodle 2.0 Preview 3

There were 76 commits into the main development branch during the last week. Repository plugins mahara and remotemoodle were moved from standard distribution into contrib.

Quotes of the week

“Oh dear I’m a geek. New neighbours move into the next flat while their extension’s built. How do I welcome them? Let them connect to my wifi”
Tim Hunt

“Only people that truly hate PHP can program something with it. The others who love it are not programmers :-D
Petr Å koda


Mar 30 2010

Moodle development traffic 12/2010

Latest stable version is 1.9.8

On 25th March 2010, Moodle 1.9.8 was released. See Release notes for details. As usually, admins are advised to upgrade production servers.

There are 6 commits into MOODLE_19_STABLE from the last week done by humans. As always in this blog, commits by Moodle Robot are excluded from these statistics. Moodle Robot tirelessly increases the build number in version.php every day (yes, paradoxically that line with “Human-friendly version name” comment).  By the way – thanks to this job, Moodle Robot is the 10th most active contributor into Moodle code and even has reached Kudo Rank 8 at ohloh.net :-)

In the last commit before Moodle 1.9.8 was released, Petr Skoda fixed proper handling of HTTPS wwwroot in Flash version detection (MDL-21910).

Unstable development version 2.0dev

There were 89 commits into the main development branch last week (from Tuesday 00:00 to Monday 23:59 in my git clone). Among other core developers, Sam Hemelryk committed his recent work on new themes layout (MDL-21862). In Moodle 2.0, there is kind of system theme called “base” which defines just very basic CSS. All other themes (including the new “standard”) are built upon this base theme. Web designers should follow http://docs.moodle.org/en/Development:Theme_changes_in_2.0 when preparing the themes for the incoming major release. I committed a series of patches that move language files into their plugin space (MDL-21694). In Moodle 2.0, English strings for activity modules and other plugin types are stored in the plugin space so there is no difference between core plugins and contributed plugins.

Quotes of the week

“2.0 beta is so much a priority right now I’ve stopped eating”
Martin Dougiamas

“I think reinventing the wheel is usually a good idea personally, but that’s just me. :)
Sam Marshall

“In PHP, every dog and his master wants to write their own framework.”
Penny Leach

Eins, Zwei, Polizei

If you ask a mother how many children she had, you do not expect her actually counting them. She just knows. Similarly, if you ask PHP array how many items it contains using count() function, you expect it will be very cheap call. Knowing the number of contained items is so natural thing that array should return the value almost immediately with almost no cost. Well, apparently not in PHP. The following trivial simple script demonstrates the performance difference between using count() function compared with keeping the number of items in a separate counter:

define('MAXITEMS', 10000);

function using_array_count() {
    $a = array();
    while (count($a) < MAXITEMS) {
        $a[] = 1;
    }
}

function using_own_counter() {
    $a = array();
    $i = 0;
    while ($i < MAXITEMS) {
        $a[] = 1;
        $i++;
    }
}

$start = microtime(true);
using_array_count();
$t1 = microtime(true) - $start;

$start = microtime(true);
using_own_counter();
$t2 = microtime(true) - $start;

printf("%f %f %f \n", $t1, $t2, $t1/$t2);

At my notebook, the variant using count() is about five times slower than the one using the own counter. Not big deal, right? But Jens Eremie from Humboldt Universität in Berlin realized that in more complicated scenario (two arrays involved, resetting the array, keeping the number of items under a given limit etc.) the difference may be significantly higher – even hundreds times. Such a scenario that Jens was dealing with is Moodle cache_context() function. In MDL-19702 you can find a test script showing quite interesting figures. At large Moodle installations with many courses and categories (therefore many context) using count() in a loop makes caching a pain. Therefore, places like MyMoodle page run into a unacceptable poor performance and can easily reach PHP max_execution limit. After some not-so-complicated changes, things just run.

During a nice discussion at MoodleMoot in Berlin, Jens explained me several caching improvements he proposes. Getting rid of using count() is just one part of them. Jens realized that it does not make sense to shift the internal cache array (that is to remove the first cached item) after reaching the limit of cached items. As the contexts come into the cache in some order and they may be requested in the same order, the cache may actually never hit if the number of contexts is higher than cache limit. It is better to randomly pick instead of using the first item always. Then the cache works better in first-in first-out scenarios. Jens also proposes to follow the common wisdom (used for example in microprocessor caches) to prune items down to 80% of cache capacity at once instead of freeing items one by one when needed. And last, but not least, one the cached context is hit, it will be probably used more than once in Moodle code. Therefore the items should stay as long as possible in the cache once they have been hit so the random picking should prefer items with not hit so far.

Disclaimer: if I am not right or even if I am wrong, there is a bug in my understanding of the issue. I write here what I remember from the discussion with Jens and chances are I misinterpreted something. Blame me, not him. Praise him, not me.

Post scriptum

PostgreSQL’s EXPLAIN ANALYSE rocks.


Feb 17 2009

Course contents block

I published a simple Moodle block called Course contens. It generates a list of all visible topics/weeks in the course. Clicking at one of these links displays that particular week or topic. What I needed to solve was how to automatically obtain a title for every course section.

The block extracts a suitable title for every week or topic from the section summary. If you start summary with a heading (H1, H2, H3, etc), it will use such heading text. If your summary starts with a bold text, it will be used as a section title. If the summary consists of several paragraphs, the first one will be used. If the summary is empty, a customizable text “Unit X” (where X is the number) is displayed.

Technically spoken, the plain text content of the first non-empty HTML DOM node of the section summary is used as the summary title. I realized that Moodle 1.9 does not contain any HTML parser so the block source code is shipped with a its own Simple HTML DOM parser library (credit goes to S. C. Chen and other contributors).


Jan 22 2009

Forcing Moodle and any other PHP application to always display errors

Again, I ran into situation when a very o{l|d}d installation of Moodle displayed just an empty page instead of a useful error message. That’s why I again had to play a bit with all these error_reporting and display_error settings. Let me summarise how it works – at least as far as I know.

There are basically four places where PHP configuration parameters can be defined: 1) php.ini, 2) httpd.conf 3) .htaccess and 4) source code itself.

The file php.ini contains site-wide PHP configuration. You can define parameters by assigning them a value – eg display_errors=1

PHP configuration can be defined in Apache configuration as well (either in httpd.conf or a sub-config file included by it). This allows you to redefine defaults from php.ini for a particular directories. You can use statements like php_flag to php_value do it. Or, you can use php_admin_flag or php_admin_value to do the same with the exception that the later form can not be redefined at a lower level again.

If you have “AllowOverride Options” defined in Apache for the given location, you can use .htaccess files to redefine PHP settings. The statements php_flag or php_value can be used in .htaccess files. The forms php_admin_* are not supported at this level.

Finally, you can modify some settings by ini_set() or error_reporting() PHP commands. It seems to me that these are at the same level as .htaccess is. I mean – IMHO there is no way how to disable ini_set() in .htaccess.

What I needed was to disable Moodle code to call error_reporting(0) and ini_set(‘display_errors’, 0). Therefore I had to go a relevant httpd.conf section and use

php_admin_flag display_errors on
php_admin_value error_reporting 2147483647

With this settings, neither .htaccess nor source code itself is able to hide any error message any more.





film streaming sur Megaupload