User talk:Rlink2/Archive 2

Latest comment: 2 years ago by Firefangledfeathers in topic Web Archive/ WP:LINKROT discussions

Web Archive/ WP:LINKROT discussions edit

Changing https://archive.ph to https://archive.today doesn't do any good. It just redirects back to the original version, so stop changing these links. -- Valjean (talk) 06:48, 6 October 2021 (UTC)Reply

It is supposed to redirect back to one of the mirrors. The idea is that if he loses one of his domain mirrors (this has happened before) then he can use archive.today to redirect to the next mirror. This is reflected in the branding (even on archive.ph, it still says "archive.today" and the share links as well) as well as he made a statement on his blog himself: https://blog.archive.today/post/659307974748160000/because-you-have-multiple-domains-can-you-tell-me Rlink2 (talk) 11:15, 6 October 2021 (UTC)Reply
Yeah he specifically asked us (Wikipedia) to use archive.today for reasons noted by Rlink2. -- GreenC 01:52, 7 October 2021 (UTC)Reply
Okay. That makes sense. Keep up the good work. -- Valjean (talk) 02:04, 7 October 2021 (UTC)Reply

Can you explain why you added an archive url at ghostarchive.org to multiple references in the Brandy Norwood today? Your edit summary didn't make clear what's going on here. —C.Fred (talk) 16:54, 23 October 2021 (UTC)Reply

https://en.wikipedia.org/wiki/Wikipedia:Link_rot, specifically: https://en.wikipedia.org/wiki/Wikipedia:Link_rot#Manual_archiving sometimes I preemptly archive links. I've been doing it for a while. The edit summary was a reference to one of her songs, sometimes i like to have a little fun there ;) Rlink2 (talk) 16:58, 23 October 2021 (UTC)Reply
Hey Rlink2, you made an edit to Sammy Malakwen and did something I've never been able to do ... archive a YT video. So is this the official way to do it now? Or is this a rickety operation that might not be around long (that sounds harsh! I mean it sincerely and with respect and hope!) Comm260 ncu (talk) 02:53, 25 October 2021 (UTC)Reply
I looked into it recently and discovered WaybackMachine has improved their video (or YouTube) archiving ability. As a test I saved the Sammy Malakwen video [1] but it requires "a few days" to derive and appear. It will be interesting to see if it works. No one can know the future of ghostarchive.org but using Bayesian theory it will not last long because it has only lasted a short time so far (the rule is assume it has lasted 50% of its life so far, given no other information to go by). That shouldn't sop us from using it we really don't know, they offer a useful service because they use Webrecorder on the back end which is IMO superior to anything else. -- GreenC 15:07, 25 October 2021 (UTC)Reply
Stellar! Let's roll it out for YT archives from now on. Thanks! Comm260 ncu (talk) 17:54, 25 October 2021 (UTC)Reply
GreenC took the words right out of my mouth about the Bayesian probability thing. I see it as this - there are no other sites that can archive Youtube videos properly and on demand that I know of. Wayback doesn't seem to archive videos on demand, and also the Wayback video player never works properly in the browser (maybe thats just me). If ghost shuts down (in my opinion, the site looks like its in a better place than Webcite, which has been around for 15 years), everything returns to the way it was now. So even though the site is relatively new, for Youtube and video sites, something is better than nothing. Youtue archival is more important than ever, seeing how Google has begun deleting old and unlisted videos, and that replacements for a (dead) video source might be hard to find, seeing how there are many concepts and ideas that can only really be shown in a video format, and also the fact that people only upload videos on big sites compared to text content which is everywhere. Rlink2 (talk) 20:01, 25 October 2021 (UTC)Reply
FWIW the Wayback video link appears to work now, at least on my Firefox browser, slow to load though. The snapshot was made on-demand via SavePageNow. This is a short video have not tested longer. I think Wayback made upgrades in the past few months specific for YT. Not sure if they are automatically saving when a new link is detected on Wikipedia, that would be interesting to know. Still think Ghost is fine diversity is good. -- GreenC 16:19, 26 October 2021 (UTC)Reply

Hi, and thanks for your works on preemptively adding archive urls. If you happen to be checking the links, please add a url-status=live parameter, so that readers clicking on the reference title link are taken to the main thing and not the archive. Firefangledfeathers (talk) 19:06, 27 October 2021 (UTC)Reply

I was under the impression that a url status parameter was not needded unless the URL was dead. It should automatically default to the actual version, with the archived version only kicking in when url-status=dead. I will start adding that, just to be sure however. Thanks Rlink2 (talk) 19:18, 27 October 2021 (UTC)Reply
It's a fair impression. You are right, except that the situation changes when archive-url is present. The template then starts assuming the original link is dead. Firefangledfeathers (talk) 19:25, 27 October 2021 (UTC)Reply
You re-directed a live URL to Ghost Archive. The source for Ghost Archive is a Tumblr blog. Since the use of Tumblr blogs is unacceptable in Wikipedia, Ghost Archive should not be used as an InternetArchive substitute -- until it has been vetted by the Wikipedia community and accepted as an alternative archive source.
In how many articles have you turned live URLs into archive status using Ghost Archive? Pyxis Solitary (yak). L not Q. 09:02, 31 October 2021 (UTC)Reply
@Pyxis Solitary: First of all, they use Tumblr to communicate. Just like archive.today. Ghostarchive is not a "source" - it is an archive site - i have never posted a archive url to their Tumblr page.
Second of all, like i said before, ghostarchive archives Youtube videos and no other archive does that. Archive.org youtube archiving is flaky, doesn't work all the time, and is not on demand archiving, but they are working on improving it. If ghost goes down, then we will be back at where we started (which is before ghostarchive) - its not like ghost is archiving stuff that archive.org or archive.today or webcitation could do and then if ghost goes down, we miss that opportunity.
Oh whoops, as for the live-url, apparently it hasn't been adding with awb. I have been adding it when i archive manually though, see here: https://en.wikipedia.org/w/index.php?title=Impact_of_the_COVID-19_pandemic_on_sports&diff=prev&oldid=1052791898. Will have to look into that, still getting a hang of the ropes. Thanks for the trout slap.
(EDIT): About 25% of the youtube links in my experience are already dead, and Google is making changes to remove videos all the time: https://support.google.com/youtube/answer/9230970?hl=en . Getting as much as we can now is important, because we don't know what the future can hold. The longer we wait, the more links die out. Rlink2 (talk) 11:44, 31 October 2021 (UTC)Reply
To clarify (I am a disclosed IA employee), things are different now at archive.org : they have very recently redone their implementation of YouTube archiving; you can do on-demand archiving (I posted an example above or just try it yourself); they are planning to soon automatically archive into the WaybackMachine every YouTube link across all 900+ Wikimedia projects that currently exists; to soon automatically archive into the WaybackMachine every YouTube link that is newly posted to Wikipedia going forward; to run IABot on 150+ projects which will insert those archives into pages after the links are determined to be dead. So, there is nothing wrong with Ghost archive, it's not a competition between providers diversity is good (IAbot supports 20+ providers). My questions are do we want to be adding Ghost at scale even when the link is live. The other thing is I believe what you are doing with AWB should have bot approval, see second paragraph of WP:MEATBOT and second para of WP:ASSISTED. -- GreenC 15:45, 31 October 2021 (UTC)Reply
As long as its directing to the original source, i dont see a problem with preempt archiving. I had a misconception that url status was not needed, but apperently it is so i started adding url-status=live to any link that is live, and url-status=dead to any link that is dead (which is usually for Wayback links since they have the deeper coverage). I review every edit i make with awb, and i never leave it unsupervised, but it looks like you are right about the high speed edit thingy. I will stop using AWB for that purpose, or at least slow down significantly, then, thanks. And there seems to be concensus that i should add url-status=live (Which i have been doing), and not any consensus against adding archive links in general (After all, IABot has a feature to preempt as well). But since concensus seems to be changing, i will limit awb use for now. I do have plans to make a bot, but i want to do that after i focus on content creation and bring some articles i had in mind to GA status. put in a bot request for some of my uses of AWB, the preempt archive might just be manual via AWB or my browser since there is middling concensus at least for now. For what its worth, i've been edit thanked like 5 or 10 times for my preempt archive efforts, so consensus is clearly leaning positive.
As for the wayback thing, its great that they are redoing their implementation. I would be more than happy to use that as well. But their video http server does not support the Range header, so i can never seem to seek videos on there (Ghostarchive also has their own seeking issues, sometimes the video will stay still while audio continues for a second or two after seeking), along with the fact it doesn't seem to work for me in any webkit based browser. And requesting a video takes days while ghost is seconds. If they could fix that it would also be nice. I would love to use w.a.org for youtube videos as well. Rlink2 (talk) 16:08, 31 October 2021 (UTC)Reply
I support preemptive archiving. I would recommend against adding url-status=dead to refs that have archive links. It has no functional or cosmetic purpose. Firefangledfeathers (talk) 14:44, 1 November 2021 (UTC)Reply

IABot edit

Hi Rlink2, not sure if you were following Wikipedia:Link_rot/URL_change_requests, a spammer has been re-registering dead domains that exist on Wikipedia and redirecting to a gambling site in Indonesia - spamming Wikipedia without making a single edit, passive spamming. Hard to detect and hard to fix. We've identified about 115 domains so far. One of the steps is the domains need to be Blacklisted in the IABot database so it propagates out to 150+ other wikis where the bot archives the links. Setting to blacklist is a bit time consuming because you have to wade through a list of candidate sub-domains ie. *.domain.com might show site.domaincom.com which would not be blacklisted. For an example: iabot.org -> Manage URL data -> Manage entire domains -> hinduonnet.com (not exact match) -> [list of candidates] .. this case would be "Select all" -> Global live state = Permadead .. and that's it. IABot is now informed to start archiving links on 150+ wikis. You might not see Permaead (blacklist) as an option without admin privs which can be granted. If this looks at all interesting let me know. I thought about posting for help somewhere public but you seem interested in archiving this could be a high impact thing for wikipedia. -- GreenC 16:47, 26 October 2021 (UTC)Reply

This is certainly interesting. But i am confused about how the blacklisting works. Are you saying the regex is not accurate? Or are you saying each domain needs to also have subdomains added (but that shouldn't be a problem since most sites have no subdomain or www subdomain). This is once again why I think archiving is very important. Rlink2 (talk) 19:43, 26 October 2021 (UTC)Reply
Ok great! Your account is upgraded you have perms to permadead (blacklist) domains. A better example (don't actually blacklist it) would be time.com since it shows a lot of false positives and would want to checkbox anything *.time.com including time.com itself and skip the rest. -- GreenC 01:54, 27 October 2021 (UTC)Reply
I think i also need "changedomaindata" as well :) Rlink2 (talk) 02:48, 27 October 2021 (UTC)Reply
Ok added some more: User flags: analyzepage, blacklistdomains, blacklisturls, changedomaindata, changeurldata, deblacklistdomains, deblacklisturls, dewhitelistdomains, dewhitelisturls, reportfp, submitbotjobs -- GreenC 03:06, 27 October 2021 (UTC)Reply