Loading...
 
Print

URL and 404 Problem

URL & Naming Recommendations

October 18, 2004
Saul Bottcher (saul.bottcher@sympatico.ca)
Please excuse any poor translations in this document.



Summary/Objective

URLs began as a solution to a technical problem – how to identify internet resources under a common scheme. The technical person views a URL as something that maps from a string to a file system and/or application being accessed by the web server. This style of URL is machine-centric, offloading the work of remembering and communicating cumbersome URLs to the human.

As the internet has evolved into a mass medium, the URL has taken the role of a linguistic/symbolic marker – used by individuals and media just as often as computer systems. The symbolic understanding of a URL is that it directly represents an abstract information hierarchy, which (automatically and transparently) is mapped to a file system or application via a series of substitutions performed by the web server. This style of URL is human-centric, offloading the work of parsing and converting URLs to the computer.

The recommendations in this document are intended to help implement and support symbolic-style URLs, which will:

• Increase likelihood of visual and aural transmission of URLs
• Increase ease of memorisation and recall
• Eliminate typing errors
• Make it possible to guess URLs (both with and without reference to existing hierarchies)
• Make it possible to serve best-match content for unrecognised URLs

Each recommendation is described in the form of a specific rule or technological feature to be adopted, along with implementation notes and recommended priorities. Recommendations are ordered by overall priority (highest first).


Simple Naming

Description: Adopt a rule that URLs be composed of only alphanumeric, slash, dot, and dash characters. These are all unshifted characters (reducing typing errors), commonly used by lay people (easiest to remember and communicate), and are a sufficient minimum set for naming any public resource on our site.

Implementation:
For standard HTML pages, we simply follow the rule when creating names.

For Wiki pages, URLs such as:

greenparty.ca/livingplatform/gettingstarted

must be mapped to:

greenparty.ca/livingplatform/tiki-index.php?page=Getting+Started

Inserting the string “tiki-index.php?page=” is trivial. Converting the page name requires a database lookup (strip whitespace from Wiki names, convert to lowercase, and compare to user-supplied URL). These substitutions would be triggered in areas of the site flagged as a Wiki (e.g., “greenparty.ca/livingplatform/{anything}”); therefore a list of Wiki areas must be maintained as well.

For PHPWebsite pages, a URL such as:

greenparty.ca/news

must be mapped to:

greenparty.ca/index.php?module=article&view=13

Inserting the string “index.php?module=article&view=” is trivial. Converting page names requires a database lookup that matches human strings (“news”) and converts them to page IDs (13). These substitutions would be triggered in areas of the site flagged as PHPWebsite; therefore a list of PHPWebsite areas must be maintained as well.

Priority: High. Garbage names are significantly reducing the likelihood that the media or public will successfully exchange and use URLs for key resources on our site.


Aliases

Description: Adopt a rule that the following equivalents must always be available via aliases:

• Acronyms and full expansions (e.g. “livingplatform” and “lp”)
• Plural and singular forms (e.g. “greenparty.ca/riding/{...}” and “greenparty.ca/ridings/{...}”)
• Alternate spellings and equivalent terms (e.g. “greenparty.ca/policy/sports”, “greenparty.ca/policy/recreation”, “greenparty.ca/policy/sportsandrecreation”)
• Common incorrect spellings and typos.

This allows our site to respond robustly to common human errors and produce the correct page immediately.

Implementation: An alias list. Optionally, for best results, a script that tallies 404 errors from server logs and e-mails a ranked list to web staff on a weekly basis. (This will reveal typos, alternate names, and so forth that were not obvious at the time the page was originally named. Any recurring 404 that was unambiguously intended to reach a specific page should be added as an alias).

Priority: For the basic alias list and aliasing policy, high. Viewers are lost when a seemingly-correct URL (say with “policies” instead of “policy”) does not produce the expected output. For the 404 tally script, low priority.


Bilingual Capabilities

Description: With the expectation that all content will be available in both languages, we should enable full bilingual interchangeability for roots, folders, and “fr”/“en” particles. The desired display language should be determined by the particle, if present, or by the root. The resulting URL would use the root to specify language, and would use sub-folder names from the correct language. Hence, all of the following mappings would occur:

greenparty.ca/plateforme -> greenparty.ca/platform {english content}
greenparty.ca/platform/en -> ditto
en.greenparty.ca/platform -> ditto
greenparty.ca/en/platform -> ditto
greenparty.ca/platform/fr -> partivert.ca/plateforme {french content}
fr.greenparty.ca/platform -> ditto
greenparty.ca/platform/fr -> ditto

This allows for easy switching between languages, either by appending or prepending the particle, or by changing the root, without any worry that the directory tree is in the “wrong” language.

Implementation: A rewriting algorithm that dissects the incoming URL, determines the desired language (based on particle or root), translates subfolder names to the correct language, and constructs the resulting URL from root & subfolder names. A database of subfolder name translations would be needed.

Priority: High – we’re a national party in a bilingual country.

Note: This implementation is fully extensible to additional languages – to enable Spanish content, register “partidoverde.ca”, add the “es” particle, and database entries for Spanish subfolder names. This becomes more relevant if, down the road, we integrate auto-translation software into our webserver.


Suggested Key URLs

Description: Implement the following URLs and their french equivalents. (Curly braces indicate optional text, angle brackets indicate required text).

greenparty.ca/riding{s} -> list of ridings and postal-code lookup
greenparty.ca/riding{s}/<official-name>
greenparty.ca/riding{s}/<elections-canada-number>

greenparty.ca/candidate{s} -> list of current or most-recent candidates
greenparty.ca/candidate{s}/<name>
(Where the candidate’s name can be specified in any of these formats: jimharris, harrisjim, jharris, harrisj, jim.harris, harris.jim, j.harris, harris.j).

greenparty.ca/council{lors} -> current council roster

greenparty.ca/member{s}/<name>
(For council, candidates, former candidates, or any other members who might have a bio page).

greenparty.ca/policy/{topic}
greenparty.ca/policies/{topic}
greenparty.ca/issue{s}/{topic} -> alias for “/policy”

greenparty.ca/platform/{YYYY}

http://greenparty.ca/donate(external link) now works
http://greenparty.ca/join(external link) now works
http://greenparty.ca/volunteer(external link) now works

greenparty.ca/news/{optional date to filter dynamically, otherwise most recent}
greenparty.ca/calendar/{optional date to filter dynamically, otherwise upcoming month}

greenparty.ca/search/<string> -> site-wide search

greenparty.ca/election/{YYYY} -> without the year specified, “election central” for current or most recent election. With the year specified, results, news, and other information for past elections. When a current election is within 1-2 months, “greenparty.ca” may redirect immediately to “/election”.

Implementation: Various database lookups and URL redirects.

Priority: Varies. High for things like “/join”, “/policy”, “/platform”. Medium for things like “/election”, “/candidate”, “/riding”. Low for things like “/news”, “/calendar”.


Eliminate Page Extensions

Description: Extensions such as “.html” should never be required – they hold neither meaning nor interest to the user, reduce transmission of URLs, and are a source of errors.

Implementation: For any page that is not a PHPWebsite or Wiki page, either append “.html” to the URL, or serve “index.html” from the directory specified by the URL. (Note, this assumes no directory will contain both an HTML file and a subdirectory with the same name, which is a fine rule to follow anyway).

Priority: High – more URL garbage.


Description: When an unrecognised URL is requested, a smart search feature is activated. This search returns an intelligent ranking of potential destinations. (When a URL is partially correct, the correct portion should be used to weight search hits. Folder names should carry more weight than body text. Example: “greenparty.ca/forestry” should give “greenparty.ca/policy/forestry” as its highest-ranked result, likely followed by “greenpart.ca/policy”, as the latter would contain the word “forestry” in the body text). Smart search should respect the specified language and any other identifiable particles present in the incorrect URL. (So, “partivert.ca/forestry/pdf” should return “partivert.ca/politique/sylviculture/pdf” as its top hit).

Implementation: Redirect all bad URLs to a search algorithm that dynamically generates result pages. The search algorithm would very likely need to be custom-coded to respect other features, such as viewing mediums, language particles, etc.

Priority: For a basic search, High. (404’s look unprofessional and turn away potential viewers). For an advanced “smart” search, Low.


Publicisation Rules

Description: Adopt the following policies when publicising URLs in party literature or press releases:

• Use the appropriate root (“greenparty.ca” for anglophone media, “partivert.ca” for francophone media).
• Do not include “http://”, “www.”, “.html”, or any other component which can be inferred by the web browser or web server. (In other words, publicise the minimal name).
• Publicise the highest-level URL that is appropriate. (For example, in an article about agricultural policy, it would be better to publicise “for a full list of party policies, see greenparty.ca/policy” than “for complete agricultural policy, see greenparty.ca/policy/agriculture”. A person reading the article may be interested in our other policies as well; this allows them to explore with better context).
• Publicise only the “official and best” URL. (For example, in francophone media, publicise “partivert.ca/plateforme”, not “fr.greenparty.ca/platform”).

Implementation: Simply follow the rules, taking into account other features that have or have not been implemented (for example, bilingual URL capabilities).

Priority: High. All URLs in press releases or party literature should be standardised and professional.


Case-Neutral URLs


Description: The case of alpha characters should be completely ignored in all URLs. Case-sensitivity leads to typing errors, but has no practical benefit whatsoever.

Implementation: Map all alpha characters to lower-case for incoming URLs. Ensure all existing resource names are lower-case only.

Priority: Medium. Completely reasonable URLs such as “greenparty.ca/LivingPlatform” are currently producing a 404 error.


Date-Specific Resources

Description: Adopt a standard format for naming and accessing date-specific resources. The URL of the resource should end with “/YYYY”, and optionally “-MM” and “-DD”. If a user over-specifies the date (including providing a date for a non-dated resource), the date should be automatically truncated or removed. If a user under-specifies the date for a date-specific resource, a menu should be served, or the most recent version (within the range specified) should be served. Examples:

overspecify:
greenparty.ca/values/2001 -> greenparty.ca/values
greenparty.ca/platform/2004-10-18 -> greenparty.ca/platform/2004

underspecify:
greenparty.ca/platform -> greenparty.ca/platform/2004 (most recent)
greenparty.ca/council/minutes -> {menu}

Implementation: A list of URLs where a date is expected, the specificity of date expected, and a flag for the behaviour to use when date is under-specified (menu or most recent). An algorithm to interpret dated URLs, with reference to this list, and serve the correct document or a dynamically-created menu of matching documents.

Priority: For the naming standard, high, to minimise renaming later. For the named URL interpretation, medium-to-low, as (to my knowledge) the number of dated resources currently on-line is not large.


Viewing Mediums

Description: By suffixing URLs with various particles, the user can specify their preferred viewing medium. Suggested suffixes include “/pdf” for print-quality PDFs (when available), “/textonly” for text-to-speech and handheld devices, “/largefonts” for visually-impaired-friendly formatting, “/edit” to edit wiki pages, and “/print” for automatic print-from-browser. Under this universal scheme, candidates and speakers can easily remember, for example, that printable policies are available at “greenparty.ca/platform/pdf” or “greenparty.ca/policy/forestry/pdf”. (No more remembering specific PDF filenames). Furthermore, whenever a PDF version is available, the server should recognise it and automatically add a “PDF format” button to the top corner of the served page, eliminating the need for web staff to do this manually. Some particles, such as “/largefonts”, should be made “sticky” by adding them to all internal links in the served page.

Implementation: For “/pdf”, check the directory containing the HTML file to see if a .PDF file exists, and serve that instead. For “/textonly” and “/largefonts”, the page would be filtered, including insertion of the particles into HTML links to make them “sticky”. For “/edit”, the URL should be translated and passed to the Wiki script. For “/print”, the page would need to be filtered, and code inserted to trigger auto-print if possible. For the “PDF format” button, when serving any HTML page, check for PDF equivalent and, if available, insert code for the button.

Priority: For “/pdf”, medium priority (would be a very useful standardisation for candidates and speakers). For “/textonly”, “/largefonts”, “/edit”, “/print”, and the automatic PDF button, low.


Deprecated URL policy

Description: Deprecated URLs should redirect automatically to their new location. If there is no new location (for example, a policy area may be split into 2 or more pages), a list of possible destinations should be served. These redirects should be maintained forever (or until a new page is created that reuses the name). URL deprecation should be kept to a minimum.

Implementation: For simple redirects, this is trivial. For splits, the concept of a “multi-destination redirect” is needed. A list of URLs paired with possible destinations would be used to dynamically generate a menu of links. (This capability would be useful for aliases as well – for example, the URL “greenparty.ca/policy/child” could serve a menu of links to “/childcare”, “/childporn”, “/childwelfare”, and so forth).

Priority: Medium. Any deprecated URL which is not replaced with a redirect becomes a 404. Because of the longevity of print media and links from other websites, this is a significant problem.



Show php error messages