Wikipedia:Bots/Requests for approval/CutlassBot

CutlassBot

New to bots on Wikipedia? Read these primers!

Approval process – How this discussion works
Overview/Policy – What bots are/What they can (or can't) do
Dictionary – Explains bot-related jargon

Operator: Dw31415 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 11:56, Sunday, June 14, 2026 (UTC)

Function overview: Replace Archive Today links with the original source link when possible and not already hidden by CS1 templates.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python, Pywikibot

Source code available: https://gitlab.wikimedia.org/dw31415/cutlass-bot

Links to relevant discussions (where appropriate): Wikipedia talk:archive.today guidance#Wrap standalone, blacklisted link

Edit period(s): Continuous

Estimated number of pages affected: 88,385

Namespace(s): Mainspace/Articles

Exclusion compliant (Yes/No): Yes

Function details:

Use quarry to identify external deprecated archive links in the 0 namespace
Filter to paths containing a link http(s)://{hostname}. Extract the source url (note: no plans to test if the link is live, see discussion)
Find the context of the link in the page
Filter to links in [] (not in a template)
Replace the link (see discussion, replacement details under discussion.

Note: See dry run at User:Dw31415/ArchiveEdits1

Discussion

Background and Proposal: In February, the WP:NOMOREARCHIVETODAY RfC reached a consensus to “remove” all links to archive today. Since then good efforts have been made to replace the links or hide them when contained in templates. However, more than 100,000 links remain visible.

This week, Wikipedia editors documented instances in which archive.today links redirected readers to the Tehran Times rather than the expected archived content^[1]. The behavior was captured on video and discussed at Wikipedia talk:Archive.today guidance. This new behavior reduces the utility of the remaining links and demonstrates that readers cannot reliably predict where these links will lead.

I intend this bot to implement the existing community consensus by replacing archive.today links with their original source URLs when those URLs can be identified. I do not intend the bot to evaluate the continued availability of the original source material. Rather, it will restore the target selected by the original editor while removing links to a service that the community has already determined should no longer be presented to readers.

I currently operate DwAlphaBot, but propose this task should be conducted by a new bot to improve traceability and distinguish these edits from DwAlphaBot’s other approved tasks. The code is not yet complete, and discussion is ongoing regarding implementation details, including whether any hidden metadata should be preserved^[2].

I am seeking early review and guidance from BAG and interested editors and request approval for an initial trial of up to 20 edits involving only deterministic replacements where the original URL is reviewed by me. Dw31415 (talk) 13:50, 14 June 2026 (UTC)[reply]

I oppose any bot replacing these links without checking for their availability at the original URL and without checking their availability at Wayback Machine/Ghostarchive/Megalodon. sapphaline (talk) 14:53, 14 June 2026 (UTC)[reply]
Thank you for considering it. Do you oppose CS1 hiding the archive links? That’s by far the greater number (~600k) links. I’m trying to gain support for a similar approach. Are there any mitigations that would win your support? Dw31415 (talk) 15:17, 14 June 2026 (UTC)[reply]
"Do you oppose CS1 hiding the archive links?" - no, because this creates a backlog of links that need replacement. Your approach essentially means offloading one big backlog (visible archive.today links) to a different big backlog (dead links), which is even bigger and has even less people interested in cleaning it up. sapphaline (talk) 16:21, 14 June 2026 (UTC)[reply]

"Are there any mitigations that would win your support?" - checking the availability of the original URL and assigning appropriate |url-status= to the citation is a bare minimum; ideally the bot should also check the mentioned archives and add an archived copy in case it's not a redirect (3xx) or an error page (4xx/5xx). If this is implemented, then the bot should also add a hidden category for every affected page so that editors can check after the bot and replace inappropriate archives added by it. sapphaline (talk) 16:43, 14 June 2026 (UTC)[reply]
Nice idea for the category. I’ve added one to the template (link at other discussion). I need to check how to make it hidden Dw31415 (talk) 17:20, 14 June 2026 (UTC)[reply]

It would be easy to check the way back api and mark if one exists. Harder to do more than that. Dw31415 (talk) 17:21, 14 June 2026 (UTC)[reply]
(edit conflict) I oppose unless the content is available at the original URI (simply not being a 404 is not enough - check for usurpations, soft 404s, domain reselling pages, etc). in other situations it should be replaced with a working, non-deprecated archive or marked as a dead link with a comment noting that an archive.today link was removed with a link to why. If the checking cannot be done reliably by a bot then it is not a task suitable for a bot. Thryduulf (talk) 16:50, 14 June 2026 (UTC)[reply]
Please say more about “marked as a dead link with a comment”. The Mauer PDF in the linked discussion is a good example is a good example. The source url returns some minimal text (not a 404). I haven’t checked way back for it yet. Dw31415 (talk) 16:59, 14 June 2026 (UTC)[reply]
By marking as a deadlink with a note, I mean cases that where if the archive.today copy didn't exist it would be tagged using {{dead link}} (or similar) but leaving a hidden comment that the AT copy does exist if anyone wants to view it (doing so may enable them to find the information elsewhere for example). Thryduulf (talk) 22:02, 14 June 2026 (UTC)[reply]
It looks like the hidden comment part is there for all links - the template currently renders in wikicode like this according to it's documentation (one space added to defeat edit filter), which includes the information about the archive.today link. Tazerdadog (talk) 22:49, 14 June 2026 (UTC)[reply]
Thanks! I’ll fix tonight. Dw31415 (talk) 23:23, 14 June 2026 (UTC)[reply]

@Thryduulf, Have you experience the redirect to Tehran Times? If not, I'd ask you to try it. I find it unsettling. You might try the "tally-ho" archive at Frank_Frazetta. I tried to reproduce, but my home ISP actually blocks archive today. I get a connection refused error. Please try it and let us know if the Tehran Times redirect still reproduces. Dw31415 (talk) 12:45, 15 June 2026 (UTC)[reply]
I have not personally experienced that behaviour, but I don't understand the relevance to my objection. My objection is to removing an AT link without one of (a) a replacement archive of the content, (b) a working link to the content, or (c) marking the link as dead with a note that the the AT archive exists (but is not suitable for reasons explained at a linked page). {{Deprecated archive}} matches (c) only if no other archive or live copy of the content exists. Thryduulf (talk) 13:00, 15 June 2026 (UTC)[reply]

@Thryduulf, thanks for calling me back to your objection. My concern is that the conditions you outline are difficult for a bot to evaluate reliably at scale.

However, I do not think it is appropriate to hold this bot task to a different standard than the one already applied through the CS1 implementation. The community has already accepted an approach in which links are suppressed from reader view based solely on the presence of an archive.today URL. That implementation did not depend on establishing that the original URL was live, that another archive existed, or that the archive.today snapshot was not uniquely valuable.

If the community’s position is that those evaluations are required before a hidden archive.today link may be removed from view, then it follows that the same evaluations should have been required before those links were hidden from readers in the first place. I do not believe BAG should create a higher threshold for these links than the threshold that was used to hide them with CS1.

Am I missing something about how this proposal compares to the CS1 implementation or the consensus from the RfC? Dw31415 (talk) 19:59, 15 June 2026 (UTC)[reply]
if the conditions cannot be reliably evaluated by a bot then this is not a task that is appropriate for a bot to perform. Thryduulf (talk) 21:06, 15 June 2026 (UTC)[reply]
It would seem that you find the CS1 implementation to be objectionable as well, is this correct? fifteen thousand two hundred twenty four (talk) 21:12, 15 June 2026 (UTC)[reply]
If that is also being applied without meeting the above necessary conditions then yes, but my understanding is that that is not changing the wikitext and so does not harm the encyclopaedia in the same way a bot will Thryduulf (talk) 07:32, 16 June 2026 (UTC)[reply]
It was applied 22 February 2026 without meeting your desired criteria, but it did meet the RFC consensus that the links be removed as soon as practicable (though if we were to split hairs, hiding isn't removal). fifteen thousand two hundred twenty four (talk) 08:57, 16 June 2026 (UTC)[reply]
Just because we have previously been reckless with the encyclopaedia previously is not an acceptable justifcation for doing something more reckless (at best) again. The RFC did not give editors carte blanche to harm the encyclopaedia in order to achieve a goal motivated by a moral panic rather than rational thought. Thryduulf (talk) 09:19, 16 June 2026 (UTC)[reply]
I have no desire to relitigate the RFC, which it's appearing more and more like that's what this is. The consensus there found that directing readers to an archive that hijacks connections to perform attacks and modifies its contents to target certain persons was harmful, and that links to said archive should be removed asap. Any claim that this rational finding was motivated by moral panic is one I can't take seriously. I'll be focusing my attention elsewhere now. fifteen thousand two hundred twenty four (talk) 09:42, 16 June 2026 (UTC)[reply]

CS1 does not change the wiki text. It removes the archive from displaying at all at render time. I’ll try to get a before and after. Dw31415 (talk) 14:32, 16 June 2026 (UTC)[reply]

{{Deprecated archive
 |sourceurl=https://example.com/source-page
 |title=Source page
 |archivehostpath=archive .ph/YYYYMMDD/https://example.com/source-page
}}

Should I ping respondents to Wikipedia talk:Archive.today guidance#Wrap standalone, blacklisted link or should we keep support/oppose there? Dw31415 (talk) 15:12, 14 June 2026 (UTC)[reply]

We already had a full consensus discussion to do this - the RFC that had consensus to remove all archive.today links was exceptionally well attended. It closed with a consensus to go much further than this bot would, and remove every archive.today link, regardless of any hole that would be left. Since that discussion, 2 big things have happened, both of which indicate we should go ahead with this bot expeditiously. The first is the changes to the CS1 template, which hid the majority of the archive.today links in the same sense that this bot would. This proceeded with minimal controversy relative to the size of the change. The second change is the random linking to the Tehran Times when the referrer is Wikipedia. That degrades the utility of the archive, and makes it an unreliable link for our readers in the sense that it doesn't go where it promises it does. Requiring this bot to jump through excessive hoops to check for repairing the dead link is counterproductive when we need to action these removals in a timely manner. I can get on board with implementing whatever Dw31415 can implement quickly. If any of these checks on repairing the link are technically difficult or time consuming, we need to proceed without them, and invite the objectors to come in behind the bot and do them in a second pass. Tazerdadog (talk) 18:18, 14 June 2026 (UTC)[reply]
Thanks. Just to underscore, the flexibility of the Template:Deprecated archive. It’s designed so the community could decide to reverse the decision and display the deprecated links again just by updating the template. (No need to touch the pages again). Dw31415 (talk) 18:51, 14 June 2026 (UTC)[reply]

I spent about 90 minutes working out the plan for the work queue (https://gitlab.wikimedia.org/dw31415/cutlass-bot/-/blob/main/queue-implementation-plan.md?ref_type=heads). I hope to get some guidance soon from BAG on next steps. I'll be away for Monday & Tuesday. I'll be able to respond by phone but not able to work on the bot. Dw31415 (talk) 03:57, 15 June 2026 (UTC)[reply]

This is my understanding of the RFC as well, WP:NOMOREATODAY closed asking that as soon as practicable we remove all links to it, which is a rather strong result when considering that none of the initially proposed options solely concerned removal (Option A was removal/hiding). I see no qualifiers there that links should be removed, but only after the original site is determined to be live, just that they should be removed as soon as it's feasible. With a bot it's feasible, and the proposed approach using {{deprecated archive}} is highly reasonable, essentially mirroring the CS1 hiding that is already widely deployed without issue. When it comes to actioning the existing consensus I see no reason to oppose the proposal here. fifteen thousand two hundred twenty four (talk) 20:12, 15 June 2026 (UTC)[reply]

@Tazerdadog: I went through WP:NOMOREARCHIVETODAY and see nothing that supports mass removal by bot in the manner you suggest. There is strong consensus to deprecate. There is also strong consensus to get rid of these links as soon as practically feasible, in the sense that url removal must be minimally disruptive and preferably replaced by an alternative.

If what is desired is the hiding of these links from readers, {{cite xxx}} templates can be updated to hide archive.today links and put them in a maintenance category. Same from {{webarchive}}. Once that's done, we can look at bots wrapping raw urls in a similar fashion. Headbomb {t · c · p · b} 16:46, 16 June 2026 (UTC)[reply]
Pinging Voorts into this conversation - as the closer he's better positioned than I am to comment on the closure. Are there a significant number of archive.today links still visible to readers that are wrapped in a cite x or webarchivetemplate rather than as bare links? If that's the case then I absolutely agree we should fix those and then circle back to this discussion afterwards. I thought the CS1 change had taken care of them, but I could easily be wrong. Tazerdadog (talk) 16:58, 16 June 2026 (UTC)[reply]
Serves me right for trying to use the visual reply tool for anything - @Voorts: Tazerdadog (talk) 17:00, 16 June 2026 (UTC)[reply]
What's the question? voorts (talk/contributions) 17:43, 16 June 2026 (UTC)[reply]
Trying to phrase carefully so I don't put words in someone's mouth:

Does your closure at the Archive Today RFC imply community consensus in favor of using a bot to address the archive.today links assuming that such a bot is the only known way to address the links in a timely manner?

Are there any checks that the bot should perform while wrapping the link, such as checking for a live link, checking for an alternative archive, or marking links as dead that should be performed while the bot runs to remain consistent with the community's consensus? Are there any that it must perform, even if it slows the development of the bot and the eventual addressing of the links?

Is the fact that we're implementing a half measure by hiding the archive link from readers while retaining it in plaintext a fatal issue with complying with the close, given that we have been unable to get a solution to fully remove the links moving? Tazerdadog (talk) 18:05, 16 June 2026 (UTC)[reply]
The close deprecated archive.today and said all links should be removed, not just hidden. There was no consensus in the discussion to merely hide the links. The close did not speak to whether we should use a bot, but I don't see why that would be objectionable. voorts (talk/contributions) 18:20, 16 June 2026 (UTC)[reply]
"removed as soon as practical". That last part is important. Headbomb {t · c · p · b} 18:39, 16 June 2026 (UTC)[reply]

IIRC, the question of hiding vs. removing was addressed in the RfC. voorts (talk/contributions) 18:27, 16 June 2026 (UTC)[reply]
Thank you for the quick answer Voorts. @Headbomb: - is this sufficient to demonstrate community consensus for the bot task, or should we start additional discussions to establish it? @Dw31415: - can the bot be modified so that the link is removed instead of simply placed in the Wikitext, ideally in a way that is reversible or that allows other editors/bots to follow behind yours and check whether a different archive matches the archive today citation that was removed? (this could be as simple as a table of citations and archive today links off in wikispace somewhere) Tazerdadog (talk) 18:41, 16 June 2026 (UTC)[reply]
IMO that RFC has no consensus on any specific "what is the next step" item. That, to me, is something to be hashed out first before bots are coded. Headbomb {t · c · p · b} 18:44, 16 June 2026 (UTC)[reply]

@Voorts: Yes, and while everyone agrees that removal is the ultimate goal, that doesn't mean it is the first step. Headbomb {t · c · p · b} 18:41, 16 June 2026 (UTC)[reply]
The consensus was to deprecate and remove all the links as soon as possible voorts (talk/contributions) 18:43, 16 June 2026 (UTC)[reply]
"As soon as practicable" is the wording of the close. That means not mass removed by bot, unless the community decides that it doesn't want to wait and does not want intermediate steps done (like a bot run to find alternative archives). Headbomb {t · c · p · b} 18:45, 16 June 2026 (UTC)[reply]
That is some serious wikilawyering. voorts (talk/contributions) 18:46, 16 June 2026 (UTC)[reply]
Bots require clear mandates. That RFC is not a clear mandate. "We ought to start running as soon as we're ready" does not mean "We ought to start running NOW", especially when we aren't ready, and that people haven't even decided what 'ready' looks like. Headbomb {t · c · p · b} 18:52, 16 June 2026 (UTC)[reply]

The RfC said we should remove the links as soon as possible. A bot would allow us to do that. What is your objection to a bot doing it? voorts (talk/contributions) 18:59, 16 June 2026 (UTC)[reply]

"As soon as practicable" not "As soon as possible". Headbomb {t · c · p · b} 19:15, 16 June 2026 (UTC)[reply]

Dry run added: User:Dw31415/ArchiveEdits1 Dw31415 (talk) 12:21, 15 June 2026 (UTC)[reply]

Note the first example there sets the title of the archived page to be "Archived" which is obviously incorrect. If the bot is going to add errors of this nature to the encyclopaedia then that is another reason to oppose. Thryduulf (talk) 07:36, 16 June 2026 (UTC)[reply]

@Thryduulf, I agree the “archived” case should be examined more carefully. There are two other options I considered:

Changing the link to an Interstitial webpage that gives a warning, a click through, and information about whether a link exists at the way back.
Changing the link to a WP page that explains the situation and how to find the original archive link

Do you find either of these less objectionable? Dw31415 (talk) 14:38, 16 June 2026 (UTC)[reply]

p.s. here is a mock up of an interstitial https://dw31415wp-glitch.github.io/archive-checker-bot/?url=https://archive-today/2025.01.01-120000/https://example.org/article Dw31415 (talk) 16:16, 16 June 2026 (UTC)[reply]

(edit conflict) If webpage is linked, directly or indirectly, from an article in any form other than a bare url then any metadata about that page (including its title) should be recorded (and displayed if the link is displayed) correctly. This applies regardless of whether the link is to a live webpage or to an archive, and if the latter what archive that is.

Really every link processed by this bot should be left in one of four states:

A link to a live copy of the content that supports the associated article text (with or without an archive, AT or otherwise)
A link to an acceptable archive of the content that supports the associated article text
An explicitly marked dead link with some sort of note that an archive exists at AT with some explanation why it isn't being linked (this can be inline, via a linked page or some combination)
An explicitly marked permanently dead link (there is no benefit to even mentioning a broken AT archive)

Thryduulf (talk) 16:18, 16 June 2026 (UTC)[reply]

I would probably oppose those solutions - if we have the link, it should directly go to where we said it was going to go. If we're not willing to honor the link destination, which we are not for archive today, we should not have a clickable link. In any case, I'd like to see a high quality consensus discussion authorizing these before we seriously consider implementing them. Tazerdadog (talk) 18:26, 16 June 2026 (UTC)[reply]

Any thoughts about how to have that discussion? I’m leaning to modifying the proposal so the template starts with the status quo behavior. This would allow the community to act more easily through the template (just like CS1). That might allow the bot folks to stay out of the consensus building business. Dw31415 (talk) 20:21, 16 June 2026 (UTC)[reply]

If I was proposing a way forward from here, I'd first make sure that this comment from @Headbomb: didn't get lost in the shuffle:

If what is desired is the hiding of these links from readers, {{cite xxx}} templates can be updated to hide archive.today links and put them in a maintenance category. Same from {{webarchive}}. Once that's done, we can look at bots wrapping raw urls in a similar fashion.
— User:Headbomb

.

If we have any low hanging fruit contained in these templates, we should at least disable reader facing links while we have the followup conversation via a single edit to the templates.

Following that, I'd I'd defer to the BAG. I think we've made the best case we currently have for an existing consensus with the closer of a very well attended RFC coming in to this BRFA discussion. If that's good enough, great. If it's not, I'd ask what we do need, then hold that discussion. I could see a case for needing a more recent consensus, for needing a consensus to do something specifically with a bot instead of just as soon as (possible/practicable), or a clarification on what the bot needs to do versus what can/should/technically must be left undone, or a clarification on the final desired state (removal with no easy way to undo it, removal with an undo button/database in the template, hiding it in the wikitext so a volunteer could repair but no layman would find it, hashing the citation so that if you don't know the trick we tell to citation repairers you can't find it, etc.) Tazerdadog (talk) 00:21, 17 June 2026 (UTC)[reply]

Looking a little deeper, it looks like Chaotic Enby pushed changes to the webarchive template shortly after this started, and the cite web talkpage redirects to CS1, so that might be already done? Tazerdadog (talk) 00:29, 17 June 2026 (UTC)[reply]

Yes, those changes to CS1 and webarchive have already hidden 600k-ish links. This bot proposal targets the remaining 100k-ish. Dw31415 (talk) 01:33, 17 June 2026 (UTC)[reply]

@Fifteen thousand two hundred twenty four, @Headbomb, @Sapphaline, @Thryduulf, @Voorts: Just FYI, these websites were added to the global spam blacklist yesterday. This affects all WMF wikis, not just the English Wikipedia. WhatamIdoing (talk) 02:17, 17 June 2026 (UTC)[reply]

The websites have been added to our local whitelist, and the current edit filter is keeping everything out with a more customized error message and fewer side effects than a spam blacklist. For most practical purposes this should take us back to the status quo of the last 3 months. Tazerdadog (talk) 03:06, 17 June 2026 (UTC)[reply]

@Headbomb, thank you for reviewing and for your work on BAG. I understand your position that the Wikipedia:NOMOREATODAY RfC does not provide sufficient consensus for a bot to remove links. Please note that this proposal doesn't actually remove the archive today links. It wraps standalone archive today links in a new template. CS1 and webarchive already^[3] hide the archive today links from readers. This bot would empower the community to more effectively implement the RfC through modification of the template. Is there any role for a bot based on the decision of the RfC and the consensus in the CS1 and webarchive moves? Thanks! Dw31415 (talk) 04:15, 17 June 2026 (UTC)[reply]

References

[1] Wikipedia talk:Archive.today guidance#c-Iam-py-test-20260612142000-Dw31415-20260610112000

[2] Wikipedia talk:Archive.today guidance#Wrap standalone, blacklisted link

[3] ttps://en.wikipedia.org/w/index.php?title=Module:Citation/CS1/sandbox&diff=prev&oldid=1339539206

[1]

[2]

[3]