Caution: This documentation is for eZ Publish legacy, from version 3.x to 5.x.
For 5.x documentation covering Platform see eZ Documentation Center, for difference between legacy and Platform see 5.x Architecture overview.

URL transformation rules

When a site administrator enters a value for a new virtual URL, the system will perform cleanup of the input by using so-called URL transformation rules. This is done in order to avoid problems with certain characters and to ensure that the alias conforms the standards and the other URLs of the site. If an inputted alias is modified, the user will be notified.

Note that in eZ Publish 3.10, the transformation of entered/generated aliases has changed.

Unicode support

In versions prior to 3.10, URL transformation rules were more restrictive and only supported some ASCII characters (lowercase Latin letters from "a" to "z", digits and underscores). This caused problems for many non-western languages that use different alphabets, some of them which are difficult to transliterate.

From eZ Publish 3.10, it is possible to enable Unicode support for the URLs and thus no transliteration needs to be performed since most characters are allowed. The following characters are not allowed: ampersand, semi-colon, forward slash, colon, equal sign, question mark, square brackets, parenthesis and the plus sign. Note that spaces are only allowed as word separators. These characters are not allowed in order to avoid miscellaneous problems (related to the HTTP protocol).

The Unicode characters are encoded using the IRI standard. The text is encoded using UTF-8 before further encoding is performed. The resulting URL will contain characters that are compatible with the HTTP protocol and which will work in all existing browsers/clients. Note that modern browsers will decode the URL and display the characters using Unicode.

Dash/underscore/space

In versions prior to 3.10, only underscores were allowed as separators of words. From 3.10, it is possible to choose which word separator that should be used. This can be done by changing the value of the "WordSeparator" configuration directive located in the [URLTranslator] section of an override for "site.ini". It can be set to either "dash", "underscore" or "space". Note that this setting will be ignored when the "urlalias_compat" transformation method is used (since it only supports underscores as separators).
 

Case sensitivity

When the "urlalias" or "urlalias_iri" transformation method is used, the URLs will consist of mixed cases (uppercase and lowercase characters). This is different from the traditional/old behavior where every letter was converted to lowercase. Instead, the system will preserve the cases and store the URL aliases accordingly. However, the URLs themselves will not be case sensitive. For example, the URL alias for a node called "About Us" will be "About-Us" (assuming that the word separator is a dash). The "About Us" node will be accessible regardless of how the URL is specified when it comes to lowercase and uppercase letters. In other words, the node will be accessible through all of the following URLs: "www.example.com/about-us", "www.example.com/About-us", "www.example.com/ABOUT-US"; and so on.

Note that if there are two nodes with (almost) identical names within the same location (for example "My article" and "My Article" inside a folder called "News"), the system will generate unique URL aliases for newly introduced conflicting nodes by attaching numbers to their URL aliases. For example, if a node called "My article" already exists and "My Article" is created at the same location, the URL alias of the second ("My Article") node will be "My-Article2". If a third "MY Article" node is introduced, it's URL alias will be "MY-Article3"; and so on.

Alias text filtering

Support for filtering has been implemented in order to introduce more flexibility when it comes to the generation of the aliases. The filters are performed by the system on the URLs before the result is transformed into a valid alias. The filters can be created as extensions. The following text explains how to create a new filter:

1. Create an extension (i.e.: the 'myfilter' extension by creating the folder extension/myfilter)
2. Enable this extension by adding its name to the "ActiveExtensions[]" array under [ExtensionSettings] section in settings/override/site.ini.append.php
3. In extension/myfilter/settings/site.ini.append.php, add a new filter class to the "FilterClasses[]" array (i.e.: "StripWords") under the [URLTranslator] section:

<?php /*
[URLTranslator]
FilterClasses[]=StripWords
*/ ?>

4. Create the file containing the StripWords class, i.e. on "extension/myfilter/urlfilters" directory:

<?php
class StripWords extends eZURLAliasFilter
{
      function process( $text, $languageObject, $caller ) {
             return str_replace( "window", "", $text );
      }
}
?>

The filter class "StripWords" implements a method called "process" which has three parameters: the text to filter, the language object (eZContentLanguage) and the node (eZContentObjectTreeNode) for which the URL alias is being generated. The method returns a filtered version of the text. In this example, all occurrences of the word "window" are removed (replaced with nothing). In other words, after this filter is introduced, newly created URLs will not contain the word "window".

5. Regenerate the autoload array for extensions by using the admin interface in "Setup > Extensions" or by executing the following command:

php bin/php/ezpgenerateautoloads.php -e

Refer to the "[URLTranslator]" section of the "site.ini" for more information about the "Filters" setting.

6. Optional: Run the updateniceurls.php script to update the existing URLs:

php bin/php/updateniceurls.php --update-nodes

Julia Shymova (14/09/2010 12:21 pm)

Andrea Melo (05/09/2012 2:48 pm)

Geir Arne Waaler, Andrea Melo


Comments

  • Filters[] is replaced in 4.3

    by FilterClasses[]=MyFilterClass

    hope this saves someone else an hour or 2.