S-Lang and PCRE differences

Introduction

Several newsreaders, such as Pan, XNews and Tin, use Perl-Compatible Regular Expressions (PCRE) in their scorefiles. Slrn is unique in using S-Lang's simple inbuilt regular expression routines instead. The following is a guide for intermediate users of PCRE-enabled newsreaders (not beginners or experts) who wish to convert the regular expressions in their existing scorefiles for use in Slrn. Please note that this is currently work in progress.

Case-sensitivity

In PCRE, case-sensitivity is turned on with (?-i) and turned off with (?i). In S-Lang the equivalents are \c and \C.

Examples:

% Pan or XNews:
Score: =-9999
Subject: (?-i)HELP
Score: =9999
Subject: (?i)slrn

These rules kill posts whose subject includes "HELP" (but not "help" or "Help"), and mark posts whose subjects include "slrn", "SLRN" or "Slrn" as interesting. The exact equivalent in Slrn is as follows:

% Slrn:
Score: =-9999
Subject: \cHELP
Score: =9999
Subject: \Cslrn

Word boundaries

In PCRE, word boundaries are matched with \b. In S-Lang, a distinction is made between the beginning of a word and its end, \< being used for the former and \> for the latter.

Examples:

% Pan or XNews
Score: =-9999
From: \bfred\b
% Slrn
Score: =-9999
From: \<fred\>

Both rules match " fred " but not " alfred " or " frederick ".

Parentheses

In both S-Lang and PCRE, parentheses can be either literal matches for parenthesis characters or indications that part of a regular expression is to be treated as a group, depending on whether they're escaped or not, but the syntax is reversed in the two languages.

PCRE: ( and ) group a sub-expression; \( and \) are literal matches.
S-Lang: \( and \) group a sub-expression; ( and ) are literal matches.

Examples:

% Pan or XNews
Score: =-9999
From: (kook)\1\1
Subject: \(off-topic\)
% Slrn
Score: =-9999
From: \(kook\)\1\1
Subject: (off-topic)

Both examples match messages written by "kookkookkook" whose Subjects contain the literal string "(off-topic)".

Note that such backreferences as \1, \2, etc are the only use for grouping parentheses in S-Lang. Patterns such as (foo)+ have no S-Lang equivalent. In S-Lang, ?, + and * match only a single preceding character or character class.

Further reading

Full details of S-Lang and PCRE regular expression syntax are available here:


Version 1.0.2, last modified 2008-02-08.

Please send any comments, corrections or suggestions related to this page by email.

Valid HTML 4.01 Strict. Valid CSS Level 1.