All regular expressions are standard PCRE unless otherwise stated. A few might be Cirrus regexes used by Wikipedia's regex editor; see also Elastic search Cirrus regex syntax. Template:Regex can be used to safely test regexes.

Regex match

edit
edit

Regex flavors:

edit
  1. Match piped link in any namespace (e.g, could be 'File:' in first part)
    • \[\[([^]]+)\|([^]]+)\]\]
  2. Match piped link in current namespace only (not containing colon in first part):
    • \[\[([^]:]+)\|([^]]+)\]\]
    • \[\[([^|\]]*?)\|([^]]+)\]\]
  3. Piped or unpiped link in current namespace:
    • \[\[([^|\]]+)\|?([^|\]]+)?\]\]
edit
  • \[\[([^\|]{6,})([^\|]+)\|\1[^\]]?\]\]

Urls

edit

Bareurl:

  • Unnamed: <ref>\s*\[\s*[^]]+\s*\]\s*</ref>
    • Cirrus: : insource:/\<ref\>\s*\[\s*[^] ]+\s*\]\s*\<\/ref\>/
  • External url in reference (could be a complete, plaintext ref with a linked title):
    • <ref>\s*\[\s*https?:\/\/*[^] ]+\s+[^]]+\]\s*</ref>
    • Cirrus: : insource:/\<ref\>\s*\[\s*https?:[^] ]+\s+[^]]+\]\s*\<\/ref\>/

Reference

edit
  • <ref(name\s*=|s*"?([-\s\w\d]+)?"?)\s*>([^<]+)</ref>

Reference with lastN but no matching firstN

edit
  • <ref[^>]*?>[^>]*?\|last(\d)?(?!.*?\|first\1?).*?<\/ref>

Charts

edit

.tab file creation

edit
  • SRCH: ^(\S+)\s(\S+)\s(\S+)\s
  • RPLC: [\n\t"\1",\n\t\2,\n\t\3\n\t],

Citation template creation

edit

JSTOR

edit

JSTOR cite, and intermediate steps to get there; e.g.:

JSTOR MLA

edit

Match a name, in First M. [N.] Last format:

  • ^[A-Z][a-zA-Z]+(?: [A-Z]\.){0,2} [A-Z][a-zA-Z]+\.?

Match two names in that format, joined by , and :

  • ^([A-Z][a-zA-Z]+(?: [A-Z]\.){0,2} [A-Z][a-zA-Z]+\.?)(?:, and ([A-Z][a-zA-Z]+(?: [A-Z]\.){0,2} [A-Z][a-zA-Z]+\.?))?

Match two names in that format, joined by , and , where the first is Last, First [M. [N.]] (this is JSTOR format):

  • ^([A-Z][a-zA-Z]+(?:'[A-Z][a-zA-Z]+)?(?:-[A-Z][a-zA-Z]+)?, [A-Z][a-zA-Z]+(?: [A-Z]\.){0,2},)(?: and ([A-Z][a-zA-Z]+(?: [A-Z]\.){0,2} [A-Z][a-zA-Z]+\.))?

Match JSTOR MLA entirely (needs slight work to eliminate commas in output):

  • SRCH: ^([A-Z][a-zA-Z]+(?:'[A-Z][a-zA-Z]+)?(?:-[A-Z][a-zA-Z]+)?, [A-Z][a-zA-Z]+(?: [A-Z]\.){0,2},)(?: and ([A-Z][a-zA-Z]+(?: [A-Z]\.){0,2} [A-Z][a-zA-Z]+\.))?\s“([^”]+)”\s([^,]+),\svol\.\s(\d+),\sno\.\s(\d+),\s(\d{4}), (pp?\.)\s([-–—\d]+)\.\sJSTOR,\s(http.*?)\.\sAccessed\s(.*?)\.
  • RPLC: 1=\1; 2=\2; 3=\3; 4=\4; 5=\5; 6=\6; 7=\7; 8=\8; 9=\9; 10=\10; 11=\11.


  • SRCH: ^([A-Z][a-zA-Z]+(?:'[A-Z][a-zA-Z]+)?(?:-[A-Z][a-zA-Z]+)?),\s([A-Z][a-zA-Z]+(?: [A-Z]\.)\s*){0,3}“([^”]+)”\s([^,]+),\svol\.\s(\d+),\sno\.\s(\d+),\s(\d{4}), (pp?)\.\s([-–—\d]+)\.\sJSTOR,\s(http.*?)\.\sAccessed\s(.*?)\.$
  • RPLC: {{cite journal |last=\1 |first=\2 |date=\7 |title=\3 |journal=\4 |vol=\5 |issue=\6 |publisher= |location= |doi= |issn= |oclc= |\8=\9 |url=\10 |access-date=\11}}

Ref tags

edit

Match all ref tags, reused named, named, or unnamed:

  • SRCH: (?:<ref(?:\s+name="?[^">]+"?)?>[^<]+</ref>|<ref(?:\s+name="?[^">]+"?)?\s*\/>)

Remove blanks between text and ref tag (don't use \s here which matches newline):

  • SRCH:  +<ref
  • RPLC: <ref

Supply missing blank between trailing end-ref and next text word:

  • SRCH: </ref>([^\s.<])
  • RPLC: </ref> \1

Move punctuation trailing a ref tag to before the tag (with 'dot matches newline' or PCRE 's' modifier):

  • SRCH: <ref([^<]+)</ref>([.,:;?!])
  • RPLC: \2<ref\1</ref>

cite book, from Google bibliographic info

edit
  • Search:

Title\t(.*)
(.*)
Author\t(.*)
Publisher\t(.*), (\d\d\d\d)
ISBN\t([^,]+),? ?(\d+)?
Length\t(\d+) pages$

  • Replace: {{cite book |author=\3 |last= |first= |date=\5 |title=\1 |publisher=\4 |series=\2 |isbn=\7 <!--isbn2=\6--> |oclc= |page= <!--total-pages=\8--> |url=}}

  • Search:

Title\t(.*)
Author\t(.*)
Publisher\t(.*), (\d\d\d\d)
ISBN\t([^,]+),? ?(\d+)?
Length\t(\d+) pages$

  • Replace: {{cite book |author=\2 |last= |first= |date=\4 |title=\1 |publisher=\3 |isbn=\6 <!--isbn2=\5--> |oclc= |page= <!--total-pages=\7--> |url=}}

  • Search:

Title\t(.*)
Author\t(.*)
Edition\t(.*)
Publisher\t(.*), (\d\d\d\d)
Original from\t.*
Digitized\t.*
ISBN\t([^,]+),? ?(\d+)?
Length\t(\d+) pages$

  • Replace: {{cite book |author=\2 |last= |first= |date=\5 |title=\1 |edition=\3 |publisher=\4 |isbn=\7 <!--isbn2=\6--> |oclc= |page= <!--total-pages=\8--> |url=}}




  • Search:

Title\t(.*)
(.*)
Editors?\t(.*)
Edition\t(.*)
Publisher\t(.*), (\d\d\d\d)
ISBN\t([^,]+),? ?(\d+)?
Length\t(\d+) pages$

  • Replace: {{cite book |author= |last= |first= |editors=\3 |editor1-last= |editor1-first= |editor2-last= |editor2-first= |editor3-last= |editor3-first= |date=\6 |title=\1 |edition=\4 |publisher=\4 |series=\2 |isbn=\8 <!--isbn2=\7--> |oclc= |page=<!--total-pages=\9--> |url=}}


  • Search:

Title\t(.*)
(.*)
Editors?\t(.*)
Edition\t(.*)
Publisher\t(.*), (\d\d\d\d)
ISBN\t([^,]+),? ?(\d+)?
Length\t(\d+) pages$

  • Replace: {{cite book |last= |first= |editors=\3 |editor1-last= |editor1-first= |editor2-last= |editor2-first= |editor3-last= |editor3-first= |date=\6 |title=\1 |edition=\4 |series=\2 |publisher=\5 |isbn=\8 <!--isbn2=\7--> |oclc= |page= <!--total-pages=\9--> |url=}}

Citation template tweaking and reordering

edit

Drop query params from Google books urls

edit
  • Search: (https://books\.google\.\w\w/books\?id=[^&|}]+)(&pg=[\w\d]+)?[^]|}]+
  • Replace: \1\2 

Fix MOS:REFPUNCT problems

edit
  • Search: <ref([^<]+)</ref>([.,;?!])
  • Replace: \2<ref\1</ref>

Convert <ref>{{Harvtxt...}} (with page number; sans ref name) to {{sfn}}

edit
  • Search: <ref>{{harvtxt(?:\s*(\|\s*[^|]+)\s*)(.*?)\|\s*(\d\d\d\d)\|(p+=\d+\s*)}}\s*</ref>
  • Replace: {{sfn\1\2|\3|\4}}

Citations with 'author=' to 'last=... first=...'

edit

Assumes a regular CS1 or CS2 citation, with space before vertical bar, and '|author=' present:

  • Search: \|author=([ \w]+)\s+(\w[^\|]+)\s+\|
  • Replace: |last1=\2 |first1=\1 |

Alt (author or author1; name possibly wikilinked):

  • Search: \|author1?=\[?\[?([ -\w]+)\]?\]?\s+(\w[^\|]+)\s+\|
  • Replace: |last1=\2 |first1=\1 |

AuthorN-first (or last) to firstN or lastN

edit
  • Search: \|author(\d)?-(last|first)=
  • Replace: |\2\1=

Move url to the back

edit
  • Search: \s*\|url=([^|\s]+)([^}]+)}}
  • Replace: \2 |url=\1}}

Possible failure case: *<!--Chenntouf-->{{cite web |last1=Chenntouf |first=Tayeb |date=1999 |title="La dynamique de la frontière au Maghreb", Des frontières en Afrique du xiie au xxe siècle |url=https://unesdoc.unesco.org/in/documentViewer.xhtml?v=2.1.196&id=p::usmarcdef_0000139816&file=/in/rest/annotationSVC/DownloadWatermarkedAttachment/attach_import_c35456f4-f4da-4b4a-b938-9d61f48fa689?_=139816fre.pdf&locale=fr&multi=true&ark=/ark:/48223/pf0000139816/PDF/139816fre.pdf#%5B%7B%22num%22:605,%22gen%22:0%7D,%7B%22name%22:%22XYZ%22%7D,-250,769,0%5D |access-date=2020-07-17 |website=unesdoc.unesco.org}}

Swap last with first

edit
  • Search: \|first=([^|]+)\s\|last=([^|]+)\s
  • Replace: |last=\2 |first=\1

Swap editor-last with editor-first

edit
  • Search: \s*\|editor-first=([^|]+)\s*\|editor-last=([^|]+)\s*\|
  • Replace: |editor-last=\2 |editor-first=\1|

Swap editorN-last with editorN-first

edit
  • Search: \s*\|editor(\d)-first\s*=\s*([^|]+)\s*\|editor\1-last\s*=\s*([^|]+)\s*\|
  • Replace: |editor\1-last=\3|editor\1-first=\2|

Swap lastn with firstn

edit
  • Search: \|title=([^|]+)\s?\|(last\d?)=([^|]+)\s?\|(first\d?)=([^|]+)\s?
  • Replace: |\2=\3 |\4=\5 |title=\1

Move last-first before title

edit
  • Search: \|title=([^|]+)\s*\|last=([^|]+)\s\|first=([^|]+)\s
  • Replace: |last=\2 |first=\3 |title=\1

Move year after first

edit
  • Search: ^(.*?)\|first([^|]+)(.*?)\s*\|year=(\d+)(.*?)$
  • Replace: \1|first\2|year=\4 \3\5

Punctuation after citation, to before

edit

Sfn:

  • Search: ({{sfn[^}]+}})([-–—,;!\?\.])
  • Replace: <nowiki\2\1</nowiki>

Swap |first=X |last=y around so last is first in citation

edit
  • Search: \|first(\d)=([^|]+)\s\|last\1=([^|]+)\s
  • Replace: *|last\1=\3 |first\1=\2

plain refs to cite web

edit

Text sources which don't use {{cite web}} may be transformed by a series of regex replaces, if the format is reasonably standard. For example, this change by this series:

* => * {{cite web |last=
(\, ?)(.*)$ => |first=\2
\.\ +''(.*?)'' => |title=\1
first=([\s\w]+),\s+and\s+([\s\w]+),\s+([\s\w]+) => first=\1 |last2=\2 |first2=\3
\((\d{4})\) => |year=\1
\((\d{4})\)\s+([ \w]+)\. => |year=\1 |publisher=\2
\s+isbn\s+([-\d]{10,17}) => |isbn=\1
$ => |ref=harv }}

See also User:Mathglot/sandbox/Templates/Cite MLA (in progress...)

Updating named refs to template:R

edit

Example: Holocaust denial, revision 843383121. Three steps:

1. change quoted named refs:
<ref name="([^"]+)"\s*\/> -> {{R|"\1"}}

2. change unquoted named refs (with or without trailing blanks before the slash)
<ref name=([^ #"'/=>?\\]+)\s*/> -> {{R|\1}}

3. combine consecutive R's
{{R\|([^}]+)}}\s*{{R\|([^}]+)}} -> {{R|\1|\2}} g(repeat till done)

Edit summary:

Minimize visual impact on the wikicode of [[WP:NAMEDREFS|named refs]] using [[Template:R]]. No change to rendered footnote section. Using global regex replace: 1: (change quoted named refs): s!<ref name="([^"]+)"\s*\/>!{{R|\1}}!g 2: (change unquoted named refs): s!<ref name=([^ #"'/=>?\\]+)\s*/>!{{R|\1}}!g 3: (combine consecutive Rs into one): s!{{R\|([^}]+)}}\s*{{R\|([^}]+)}}!{{R|\1|\2}}!g

Page history

edit

from scrape of rendered page

edit

Created page to parsed data

edit

SRCH:^(\d+:\d\d,\s\d+\s[JFMASOND][a-z]+\s\d{4})\s\|\s(\d+)\sdiff hist .*? N ([^+]+)\sCreated page with '{{Expand ([^|]+)\|\s*([^}]+)}}.*?$
RPLC:* ts=\1; rev=\2; ti=\3; lang=\4; src=\5

edit

SRCH: ^\* ts=([^;]+); rev=([^;]+); ti=([^;]+); lang=([^;]+); src=(.*?)$
RPLC: * ts=\1; rev=[[Special:Diff/\2|\2]]; ti=[[\3]]; lang=\4; src=[[:{{#if: {{iso 639 name|fn=iso_639_name_exists|\4}} | {{iso 639 name|fn=iso_639_name_to_code|\4}}|\4}}:\5]]

from HTML page source

edit

Article page history to parsed data

edit

Turn article page history into a series of parsed lines:

  • 1=ARTICLE_TITLE 2=REVISION 3=HH:MM 4=Month DD, YYYY 5=TOTAL_BYTES 6=BYTE_CHANGE
  1. Go to article page history page
  2. Rt-click, Page source
  3. Select-all, copy, paste
  4. Apply Search/Replace Regex below, with "dot matches newline"
  5. Optional step to convert underscore to blank in article titles

SEARCH:
<li.*?index.php\?title=([^&]+)&oldid=(\d+)[^>]+>(\d\d:\d\d),\s(.*?)</a>.*?title="([,\d]+)\sbytes after change of this size">(.?\d+)</span>.*?</li>

To generate the following output, use this replacement:
1=ARTICLE_TITLE 2=REVISION 3=HH:MM 4=Month DD, YYYY 5=TOTAL_BYTES 6=BYTE_CHANGE

REPLACE:
1=\1 2=\2 3=\3 4=\4 5=\5 6=\6

To generate the following sample output, use this replace instead:

REPLACE:
* [[Special:Permalink/\2|\2]] [[\1]] [[Special:Diff/\2|diff]] \3 \4; (change:\6b to \5 bytes)

To generate a six-column table row with this data, including one extra column for remarks, use this:
REPLACE:
|-
| [[\1]] || [[Special:Permalink/\2|\2]] || [[Special:Diff/\2|\6]] || \5 || \3 \4 || any remark here

Followed by optional underscore replacement. (s/_/ /gi).

To generate the following table row examples (table header/footer code added for context):

Article history for Example user
Article Perm Diff Len Timestamp Remark
Risk aversion916661155-131,67100:29 September 20, 2019any remark
History of the provincial electoral map of Quebec916660706-126,98800:26 September 20, 2019other remark

User contribution history to parsed data

edit

Turn article page history into a series of parsed lines:

  • 1=REVISION 2=TITLE 3=TIMESTAMPE 4=BYTE_CHANGE 5=EDIT_SUMMARY
  1. Go to user contrib history page
  2. Rt-click, Page source
  3. Select-all, copy, paste
  4. Find '<h4 class="mw-index-pager-list-header-first' and cut everything above it.
  5. Find '
  6. Apply Search/Replace Regex below, with "dot matches newline"
  7. Optional step to convert underscore to blank in article titles

SEARCH: (options: dot matches newline)
^<li data-mw-revid="(\d+)".*?class="mw-changeslist-date" title="(.*?)">(.*?)</a>.*?size">(.*?)</strong>.*?parentheses">(.*?)</span>.*?</li>$

To generate the following output
1=REVISION 2=TITLE 3=TIMESTAMPE 4=BYTE_CHANGE 5=EDIT_SUMMARY
use this replacement:

REPLACE:
1=\1 2=\2 3=\3 4=\4 5=\5

To generate: rev=REVISION title=TITLE timestamp=TIMESTAMPE bytes=BYTE_CHANGE summary=EDIT_SUMMARY
REPLACE:
rev=\1 title=\2 time=\3 bytes=\4 summary=\5

To generate: rev=REVISION title=TITLE
SEARCH: (options: dot matches newline)
^<li data-mw-revid="(\d+)".*?title="([^"]+).*?</li>$
REPLACE:
rev=\1 title=\2

New contribs Translated pages to bullet list

edit

From Special:contribs with 'new' pages box ticked; extracting pages with ContentTranslation tool summary:

  1. Search: \)‎ \. \. N\s([^(]*?) ‎ \(Created by translating the page "([^"]+)"
  2. Copy matches
  3. Replace: * [[\1]] from [[es:\2]]

Other regex replace

edit

Add leading hidden token to ref-named citations as prep for sorting the Bibliography

edit
  • Search: <!--{{sfn\|LAST\|YYYY\|p=}}--> *<ref name="([\(\)\w]+)\s+(\d+)">
  • Replace: *<!--{{sfn|\1|\2|p=}}-->

Alphabetize citations in Bibliography

edit

The technique is 1) add a leading token consisting of the (first) last name, 2) sort, 3) strip out the token. Only step 1 is shown:

  • Search: ^\*\s*{{cite(.*?)\|\s*last1?\s*=\s*([^|]+)\s*(.*)$
  • Replace: **<!--\2-->{{cite\1 |last1=\2\3

Convert glossary anchor to vanchor

edit

SEARCH:
^;\s*{{Anchor\|([^\}]+)}}(?:[-<>\s,:\w\d]+)$
REPLACE:
;{{Vanchor|\1}}

Convert glossary &tl;term> to be in-linkable

edit

SEARCH:
^{{term\s*\|(term\s*=\s*)?([^|{}]+)
REPLACE:
{{term|\1|2={{Vanchor|\2}}

ES: Convert glossary <term>s to be in-linkable via global regex replace s!^{{term\s*\|(term\s*=\s*)?([^|{}]+)!{{term|\1|2={{Vanchor|\2}}!g

Convert mu (Chinese acres)

edit

SRCH: (\d[.,\d]+) mu(\b)
RPLC: \1 mu {{convert mu|\1|mu|ha}}\2   (to 'hectares')
RPLC: \1 mu {{convert mu|\1|mu|ha|abbr=on}}\2   (to 'ha')
RPLC: \1 mu {{cvt mu|\1|mu|ha}}\2   (to 'ha', via wrapper)

edit

Parse wikilinks, exclude colons to exclude namespaces (this will exclude wikilinks that have colons in the anchor):

$1 = Target article $2 = Anchor (#-fragments untested):

  • \[\[([^:\|\]]+)\|?([^:\]]+)?\]\]

This saves the pipe (if there is one) in \2, so can use replace to generate lang-prefixed links, for example, if translating a nav template from en to fr, one could start like this:

  • Search: \[\[([^:\|\]]+)(\|?[^:\]]+)?\]\]
  • Replace: [[:en:\1\2]]

This adds superscript wikidata links to all wikilinks on a page so they can be easily translated:

  • Search: \[\[([^:\|\]]+)(\|?[^:\]]+)?\]\]('')?
  • Replace: [[\1\2]]\3<sup>[[[d:{{subst:wikidata|label|raw|page=\1}}#sitelinks-wikipedia|wd]]]</sup>

Interlanguage template transformation

edit
  1. Turn {{ca:GEC}} into {{sfn}}:
    • Search: {{GEC\|id=([\d]+)\|nom=([ \w]+).*?}}
    • Rplce: {{sfn|GEC|loc=[http://www.enciclopedia.cat/EC-GEC-\1.xml \2]}}

Convert italic markup to lang templates

edit

First, fix the links (two types, depending where italic markup is):

  • piped (type 1): e.g., ''[[École navale|FOO]]'' ⟶ {{lang|fr|[[École navale|FOO]]}}
    • SRCH: (?<!')''\[\[([^ |]+)\|([^]]+)\]\]''(?<!') # handles the 2-pop case; excludes 3-pop, but also 5-pop; add (?:''')? for that
    • RPLC: {{lang|fr|[[\1|\2]]}}
  • piped (type 2): e.g., [[École navale|''FOO'']] ⟶ {{lang|fr|[[École navale|FOO]]}}
    • SRCH: \[\[([^ |]+)\|((?<!')''([^']])+''(?<!')\]\]
    • RPLC: {{lang|fr|[[\1|\2]]}}
  • unpiped (order matters; must be done after piped links)
    • SRCH: ''\[\[([^|]+)\]\]'' e.g., ''[[École navale]]'' ⟶ {{lang|fr|[[École navale]]}}
    • RPLC: {{lang|fr|[[\1]]}}
  • What's left, is unlinked:
    • SRCH: (?<!')''([^']+)''(?<!') e.g., ''École navale'' ⟶ {{lang|fr|École navale}}
    • RPLC: {{lang|fr|\1}}

FR - EN article translation preprocessing

edit
1 <ref>{{(\w\w)}}\s*{{citation\|(.*?)</ref> -> {{efn|"{{lang|\1|\2}}"}}
2 ''{{lang\|de\|(.*?)}}'' -> {{lang|de|\1}}
3 {{citation\|(.*?)}} -> "\1"
4 <ref>\s*{{de}}\s*(.*?)\s*</ref> -> {{efn|{{lang|de|\1}}}}
5 <ref>{{harvsp\|(.*?)}}.</ref> -> {{sfn|\1}}

Substify and unsubstify

edit
Find expressions in need of safesubst (two left curlies, but not three or more)
  • Search: (?<!{){{(?!{)
Substify via |safesubst
  • Search: (?<!{){\{(?!\{)
  • Replace: {{ {{{|safesubst:}}}
  • ES:         s!(?<!{){\{(?!\{)<!{{ {{{|safesubst:}}}!g
Unsubstify via |safesubst
  • Search: \s*{{\s*{{{\s*\|safesubst:\s*}}}
  • Replace: {{
  • ES:         s!\s*{{\s*{{{\s*\|safesubst:\s*}}}!{{!g
Substify via safesubst:<noinclude />
  • Search: (?<!{){\{(?!\{)
  • Replace: {{safesubst:<noinclude />
Unsubstify via safesubst:<noinclude />
  • Search: {{safesubst:<noinclude />
  • Replace: {{
Substify via safesubst:    Recommended
  • Search: (?<!{){\{(?!\{)
  • Replace: {{safesubst:
Unsubstify via safesubst:
  • Search: {{safesubst:
  • Replace: {{
edit

Aimed at Nav template translation, so handles bulleted links, optional pops or bolding, and specific lang prefix:

Unpiped links (e.g., * ''[[:fr:Documents maçonniques]]''):

  • Search: ^\*\s*('*)?\[\[:fr:([^]]+)\]\]('*)?
  • Replace: \1{{ill|ENGLISHNAME|fr|\2|v=sup}}\3

Piped links (.e.g., * ''[[:fr:Idées (revue, 1941-1944)|Idées]]''):

  • Search: ^\*\s*('*)?\[\[:fr:([^]|]+)\|?([^]]+)?\]\]('*)?$
  • Replace: \1{{ill|ENGLISHNAME|fr|\2|lt=\3|v=sup}}\4

For bios or proper names, duplicate the Foreign name in the English article field:

  • Search: ^\*\s*('*)?\[\[:fr:([^]|]+)\|?([^]]+)?\]\]('*)?$
  • Replace: * \1{{ill|\2|fr|\2|lt=\3|v=sup}}\4

Examples:

  • * ''[[:fr:Le Juif et la France]]''
    • * ''{{ill|Le Juif et la France|fr|Le Juif et la France|lt=|v=sup}}''
  • * ''[[:fr:Combats]]''
    • * ''{{ill|Combats|fr|Combats|lt=|v=sup}}''
  • * ''[[:fr:Idées (revue, 1941-1944)|Idées]]''
    • * ''{{ill|Idées (revue, 1941-1944)|fr|Idées (revue, 1941-1944)|lt=Idées|v=sup}}''
  • * [[:fr:Publications antisémites en France]]
    • {{ill|Publications antisémites en France|fr|Publications antisémites en France|lt=|v=sup}}

Section demote

edit
  • Search: ^(={2,5})([^=].*?)\1
  • Replace: =\1\2\1=

Subsection promote

edit
  • Search: ^=(={2,5})([^=].*?)\1=
  • Replace: \1\2\1

Reflib section from last-first-year

edit
  • Search: ^\*\s*(.*?)\|last=([^|]+)\s\|first=([^|]+)\s*\|year=(\d+)(.*?)$
  • Replace:
    == \2-\4 ==
    \1|last=\2 |first=\3|year=\4\5

Regionalize English: AE to BE

edit

Zed to ess (recognize ⟶ recognise)

edit
  • SRCH: /((?:[a-z-[aeiuo]]{0,3}[aeiouy]{1,2}){1,}[a-z-[aeiuo]]{0,3}[iy])z((?:e|ed|es|er|ers|ing)\b)/g
  • RPLC: $1s$2
edit
  • SRCH: ^ *(\*+) *\[\s*(https?:\/\/\S+) +([^]]+)\] *(.*)$
  • RPLC: \1 <nowiki>\2</nowiki> – \3 \4

JSON

edit

Json to Lua

edit

Strip first two double quotes before the first colon in every line

  • SRCH: ^([^":]+)"([^":]+)":(.*)$
  • RPLC: \1\2 =\3

(A follow-up simple regex is required to change (single) brackets to curlies.)

Cirrus searches

edit

Articles with lots of Talk page archives

edit

PCRE tips

edit

Dot matches newline (dotall)

edit

To turn dotall on and off within a pattern, or just for a particular group:

  • (?s) – turn dotall on in pattern
  • (?-s) – turn dotall off in pattern
  • (?s:...) – turn dotall on for a specific group:
  • (?-s:...) – turn dotall off for a specific group:

Lookaround

edit

Lookahead

edit
  • (?=) – Positive lookahead
    • q(?=u) – q followed by u; nothing saved
    • q(?=(u)) – q followed by u; u saved in \1
    • – does not match quit, and never matches any string (one char cannot be both u and i)
  • (?!) Negative lookahead –
    • q(?!u) – q not followed by u

Lookbehind

edit
  • (?<=text) – Positive lookbehind –
  • (?<!text) – Negative lookbehind –

Examples

edit
  • (?<=a)b against "thingamabob" – matches first b
  • \b\w+(?<!s)\b – words not ending in s
    • \b\w*[^s\W]\b – same thing w/o lookbehind, but which is easier to understand?
    • note that \b\w+[^s]\b won't do it, e.g., cuz of "John's"
  • Double requirements; e.g., length + contents:
    • a word that is six letters long and contains the three consecutive letters cat:
      • no lookahead: cat\w{3}|\wcat\w{2}|\w{2}cat\w|\w{3}cat
      • lookahead, #1: (?=\b\w{6}\b)\b\w*cat\w*\b
      • optimize, #2: (?=\b\w{6}\b)\w*cat\w* – by removing zero-length word boundaries
      • optimize, #3: (?=\b\w{6}\b)\w{0,3}cat\w* – in a successful match, there can never be more than 3 letters before "cat"
      • optimize, #4: \b(?=\w{6}\b)\w{0,3}cat\w* – since 1st \b is zero-length, there's no need to put it inside the lookahead
    • any word 6–12 letters long containing "cat", "dog" or "mouse":
      • \b(?=\w{6,12}\b)\w{0,9}(cat|dog|mouse)\w* – using optimizations from above

Conditionals

edit
  • (?ifthen|else)then and else can be any regex
  • (?(?=regex)then|else) – conditional with positive lookahead in if part
  • Example: extract email headers