January 2020 donation progress: $182.48/$280 (65%). Click to learn more...
close
Hello! Thank you for using yiff.party!

We hope you enjoy using it. yiff.party operates on a non-profit basis, and as such, all the server expenses are paid by our users. We don't want to run ads or infect you with crypto miners. We depend on users like you to keep the site running, and to preserve years and terabytes of amazing content—some of which is no longer available from its original creators!

Because of the nature of the site, many users are reluctant to donate. That's OK! yiff.party was created so everyone can enjoy the content we host without any restrictions or paywalls. But if you value the service we provide, and are able to, we—and our users—would be tremendously grateful if you considered making a donation.

Donation progress for January 2020

So far, approximately $182.48 has been raised out of our target of $280.00. We're about 65% of the way there! Please note: this tracker is updated manually—don't worry if your donation doesn't show up immediately!

yiff.party's server costs are due on the last day of each month. So, we need to meet this goal before 31 January!

How to donate?

At this time, yiff.party can only accept donations in numerous cryptocurrencies. Please select a currency below to display the relevant donation address.

Bitcoin (BTC)
Bitcoin Cash (BCH)
Ethereum (ETH)
Ethereum Classic (ETC)
Litecoin (LTC)
Why can't I donate through other means (eg. PayPal)?

Due to the nature of sites like yiff.party, it is very difficult to find payment processors who will accept clients like us. If we were to accept donations via PayPal, it wouldn't take more than a day for someone to submit an abuse report and get our account frozen. Until a viable way of accepting monetary donations becomes available, cryptocurrency will remain the only option.

There are many resources available on how to purchase crypto. For Bitcoin, check out bitcoin.org's page on buying Bitcoin for a list of methods. For beginner Bitcoin users, yiff.party recommends using an escrow service such as LocalBitcoins.

Contingency: Full Site Archive?

undoReturn
Backup.png (193.2KiB, 512x512) save_alt

Please tell me to fuck off if I'm out of line; I'm an archivist and this site has fallen into my sight.

First, is there any semi-official way to take a full backup of the site?
I'll be throwing some money into their coffers, but that isn't everything here.

Archiving Patreon is a nigh-impossible task due to site changes, and spare re-writing yiff.party from scratch and archiving at a snail's pace/rebuilding the userbase, there'd be no easy way to get it back.

Secondly and finally, is there any word from the devs on sharing their back end?
So far I've found one empty github account and heck all else. Other than the BBS, and short of emailing the operators directly I haven't much to go on.
(I'm somewhat hoping they'll stumble across this thread, but can't be sure).
(I do realize releasing the backend publicly could make implementing mitigations much simpler, but at least the encrypted source with the decryption key released after a failure would be nice)

I have no intention on forking the site, but keeping backups has proven invaluable in the past, and when they weren't made, it took years to recover the content and functionality if it was ever recovered at all.

There are other threads already discussing manual archival of individual pages; rather than an unknown number of users all individually thrashing the servers, I'd wager a more official way would not only be more complete, but also drastically reduce server load and running costs.

I'm not quite sure what Archive.org would think of hosting backups in the couple terabyte range, given this sites nature, but it's always worth a try.
Regardless, contingencies are criminally underrated. The number of sites I've been to asking about database backups, to be told all is fine, only for the site to vanish without a trace in the following weeks still makes me wish I did more.

Rebuilding this site would be a monumental task, and it's better backup than loose countless terabytes of data

So, a little reading (and with another 2k characters to post with) and I've learned some info.
1: The site is currently "using" around 8TB of storage, which could mean 4TB of data backed up, 8TB not backed up, or 8TB backed up invisibly.
2: Archive.org already has a fair chunk of the site on hand, however restoring from their archive can be a pain, and there's no current feature checkout a dissolved site into an 8TB .WARK
3: The admins don't pay for bandwidth, so scraping to your hearts content will (*should) not affect running costs.
4: Backups from site owners are always less painful for all involved than scraping.
5: Just saying.
6: (That doesn't mean scraping isn't an option)
7: I have a 150TB stack of empty, unused drives on my desk.
8: If the site owners really do not have to pay for bandwidth, then I'd imagine they'd be impartial to anyone trying to perform a full site backup.
9: Google cloud services gives all new users $300 of credit for signing up, enough to spin up a VPS with 10TB of disk space (and 20TB of bandwidth), which could be used for archiving a website, before uploading it to a mega.nz account of Pro III tier, with 8TB of storage, so the google cloud instance could be deleted to prevent any undue costs while the mega account slowly downloaded to a local users machine.
10: I don't wanna do that.
11: I've done it before
12: I really don't wanna fucking do that tho

Someone always donates last minute to save the site so archiving was just a thought.

>>32159 FYI bui loves riding the gravy train

>>32153 The site is currently "using" around 8TB of storage

I'm curious as to where you got the 8TB figure from (that could have been my last posted estimate but I don't recall), but if there is an official number I'd love to know, if it is not too high I'm tempted to archive the whole thing, there's a lot of junk I'm not really interested in though...
Here's some stats from the stuff I have archived (local stats are from yesterday, current download run still in progress)

Site stats
-----------------
Total creators on site: 15,276
Total Posts on site: 2,346,414

Personal archive stats
-------------------------------------
Creators archived: 2,483 (6.15% of total)
Posts archived: 487,337 (20.77% of total)
Files archived: 61,3046
Space used: 2.452 TB
Average filesize in local archive: 4.000 MB

tbc...

External site archive stats (several months out of date)
----------------------------------------------
External content pulled from GDrive, Dropbox, Imgur, Sta.sh, gfycat, mega (incomplete due to free limitations)
External content archived - file count: 270,214 (This is going to cause me issues soon, damn...)
External content archived - quota: 2.106 TB

My estimates
------------------------
Estimated site storage based on average creator size: 15.085 TB
Estimated site storage based on average post size: 11.806 TB

I'd really prefer if someone had the complete archive with post data, I don't really have the bandwidth, it would literally take me a year to upload what I have :)


Also, 150TB is a LOOOT of storage, whats the average drive size you have?
I'm currently running out storage on my active site archiving storage, waiting on my new drives to ship...

Oh, found my old post which did say 8TB, it's really interesting that the average post size hasn't changed at all...

>>32159
It's not money that's keeping this shit alive, it's goodwill and hard work.
Money doesn't save websites, backups save them.
Money just means you usually never need the backup.

>>32176
>I'm curious as to where you got the 8TB figure from
In >>16643(>>12795), "Anonymous ## Admin" mentions the 8TB figure, that's in late 2018, and it looks like, at the time, they didn't plan on backing up.
Further comments in >>29903, but nothing from a mod. If it was already "Upwards of 8TB" in October '18, we can guess it's probably hit 10TB by now.

>Here's some stats from the stuff I have archived
>Posts archived: 487,337 (20.77% of total)
>Space used: 2.452 TB
Has the system at 11.8TB, so growing fast.

I'm curious about precisely what you're using to scrape the site, I assume a custom system.
If you're not just getting everything, what are you using to decide what gets downloaded? Recommendations from the archive thread?

>>32177

>I'd really prefer if someone had the complete archive with post data, I don't really have the bandwidth, it would literally take me a year to upload what I have :)
Well, that's what I'm here for, just need to work out a way of going about things.

>Also, 150TB is a LOOOT of storage, whats the average drive size you have?
It's been nearly a decade since I got serious with data, that's less than 20TB a year, or 10TB backed up data. That's like 30GB a day for 10 years.

I have 10x 8TB disks, five or so 4TB's, another 10x 3TB's, and a shelf full of 2TB's, all sitting empty and unused.
I migrated all my data (from the last 10 or so years) to a SAN, and all the disks filling up my cobbled together storage servers from days past are now surplus. Not the best for live storage, but if I see a site like this I wanna archive, I'll download it, throw it on a pair of disks (or a couple pairs), and then stash them away until they're needed.
I'd probably wanna do a running scrape of this place, but there's always plenty of space on my SAN, just dump it to cold storage every month and everyone's happy.

>I'm currently running out storage on my active site archiving storage, waiting on my new drives to ship...
I'm curious what size you're buying, things are different stateside but Down Under™ the cheapest you can get is a 3TB 2.5" USB 3.0 external for 99 dollarydoos (70 freedombux)
Wayback when, there were 8TB 3.5" externals in the US for 200 freedombux, so I bought a bunch when I could, I'm curious how things are going now?

Anyway, if you've got a scraper that I could tell to rip EVERYTHING, then, once it's done one epoch, start again and grab new files.
Only thing I'd be concerned about is files being removed from yiff because spam, but staying downloaded, or getting 404'd and being wiped on the backup too.

Anyway, thanks for the tips, looks like I'm not alone in my desire to keep this place safe.

>>32181

Ahh, a fellow Aussie :)

>>32159
> It's not money that's keeping this shit alive, it's goodwill and hard work.

to add to your point, it's always interesting seeing people complain about the donations, its feels expensive but the hosting actually seems fairly robust and reasonable for that price, and for that amount of money nobody would develop and maintain this site if they were after a profit, it's definitely somebody's hobby.

Interesting, it looks like this method gives good fairly accurate estimates then
> Estimated site storage based on average post size: 11.806 TB

I'm currently shucking WD MyBooks from Amazon, just grabbed 2x 10TB for about $410 shipped (started a prime trial just for that), checkout 'ozbargain' for deals specific to Aus if you haven't
I've got 6x8tb in my usually offline nas/server/box (also recently built) mainly for 'linux isos' and backups

I have an external that I occasionally use to ship data up/down from the cloud on a fast connection

My 'hot' storage for active web archives is currently 6TB and is full, I'm in the process of just archiving straight to the cloud, the 'external' sites that I archive never hit my local storage, which also means I never really see what gets archived (a problem I am yet to solve), again limited bandwidth means I can't just stream it down on the fly, which is what got me into archiving in the first place.

>>32181

For what I want to do these are my requirements:
- I want all the post data and metadata (minus comments)
- I want the archive to be actually browseable

Which means I'm actually pulling apart posts and putting them in a database (mysql)
I also have a hacked together webapp (local only) to browse/view/search stuff

> I'm curious about precisely what you're using to scrape the site, I assume a custom system.
And yep, it means the scraper is completely custom (mostly python, some 'legacy' stuff in perl) and so is a bit unwieldy

For my stuff, since the scraper is aware of 'posts', it only grabs new content and doesent bother checking existing content, which can be a problem if stuff is updated
The up side is that it is easy to grab the last updated date from every creator on this site

I can probably help you hack something together if you're commited to keeping an up to date full backup of this site

> If you're not just getting everything, what are you using to decide what gets downloaded? Recommendations from the archive thread?
As for choosing what to scrape, its a fairly automated process actually. I already scrape some other 'art' sites, as part of the scrap I search for external links and Patreon links, which then automatically get added to the archive list. I ocassionally manually browse the recommendations

>>32181

Oh, and I suspect you might be on https://www.reddit.com/r/DataHoarder/ ?
If not, you should check it out

I'm archiving the site using one of the scraping tools posted in an earlier topic. It extracts post text and attachments. The tool does not go back and fix 404 errors nor update updated posts. Currently at 4.3TB with a focus on furry content first. I'm downloading from the smallest post count to the largest. If someone wants to help archive, please download starting with the patrons with the largest post counts.

I'm not archiving the forum. I have not pulled in external content. I eventually plan to do that, but by then most of the links will have expired. Keep in mind creators like to post fake links until YP scrapes the post then they change the links to real ones. Or the links are only valid for a few days then its contents are removed. It's crazy what some people are doing to try to stop greedy people from being greedy. They must be pissing off a large segment of their followers with all the hoops they need to jump through. The people pirating don't need to jump through most of those hoops.

The admin stated the site has 8TB of content with an additional 4TB of free of space. Downloads are throttled so downloading from one connection literately takes years. I'm too busy to setup a multi-IP cloud solution nor do I know how that would impact site performance.

I've paid for about 5 months worth of donations. It would be really nice if other people also donated and donated before the deadline. I can't keep up paying for this site forever. There's someone else who's been donating about $50 each month. Whoever that is, thank you. I would stop if no one else donated. I don't even use the content. I just hate content randomly disappearing, data should be forever. Sadly money doesn't last as long.

So are we now share what we've been able to backup? Alright, here we go: I have 128 creators backed with metadata (internal/external url's, file description, yp page source at the time of crawl, name history) and external files from mega, google drive and whatever external url's my crawler was able to parse. Around 600GB total. Even includes some 404 files because I've been doing it for a while.

No idea how I will share them when the time comes. I've had an idea of making tor hidden service but this will need a separate VM in order to prevent people from trying to hack into my NAS.

ID list for those who are curious: 820839,8234000,505617,101982,102533,744091,380670,2817759,804602,500206,106008,5969229,5020303,701772,9272261,170385,460110,243156,575952,372032,4740374,557269,7399401,10411417,5919535,3137267,5668157,5760469,287725,595190,775120,4341007,442520,133338,531869,4259108,3229918,74985,262109,7242260,3605855,5270525,239705,2437195,3214973,316309,823669,4231621,562525,155537,939753,207739,2807763,297832,335709,3122648,578743,11145343,690687,4428205,177432,946301,709210,2812679,963176,4810776,356650,10508401,807720,172500,985059,99155,8525377,13007171,559737,633260,11267057,134633,284545,7561564,7919667,99312,4996145,2523749,4213703,16035948,10740552,180528,8467702,4892368,6906148,91003,80934,301899,252083,308494,84543,3534292,940590,881792,2415611,9348519,243301,596930,2320716,6584805,2457920,3224408,702809,846166,3595424,573329,754009,3712058,90303,75015,5476061,5748416,88996,99423,667690,839050,815350,15873007,670209,12411418,5867219,236176

OP sounds like someone from the Archive Team; if not, he would be an asset to it: https://www.archiveteam.org/

why are you guys not scrapping the forum
i mean its not that big (1GB max size)

>>32244
why scrape the forum?

>>32244

For my purposes, I like to archive things people have put effort into (I think it is a shame when hours/days of work is erased)

Most of the forum is just low effort whinging, maybe interesting if you are a historian, but not to me (wayback probably has it anyway)

I'll get to Archivist in a sec, don't worry.

>>32244
It's already fully backed up on archive.org, it uses a vastly different system different system than the rest of the site, and would probably need a custom solution.
I don't think it's that people intentionally haven't, I'd say it's just because no-one has.

>>32236
I run their warrior, but for the better part I just act alone, and if the shit ever hits the fan for a site, I drop into their IRC, tell them to stop panicking, and find somewhere with a fast connection to upload from for a couple days.
Jason Scott is something of a hero of mine, I've been ripping random shit for years, some of which I at one point or another had the only known existing copy of, before I uploaded it.
I just view myself as an independent node and rouge agent. Sometimes it pays off.

>>32202
>Downloads are throttled so downloading from one connection literately takes years.
Welcome to the world of spinning up 1000 identical google cloud VM's, each with 15GB storage, and ripping the entire site in a day onto mega, before trickle downloading it locally.
Hell, you can probably write scripts to do it for you, [spin up VN with (config), download posts (x-y), transfer to mega], and run that 1000x times in parallel.
Just keep running until the overall site download speed halves, then send a $100 donation to keep everyone happy.
>It would be really nice if other people also donated and donated before the deadline.
I'll donate my fair share for as long as it takes to get the site fully archived, then maybe drop to $20-30 a month to sustain it.

>>32244
Second thoughts, for shits and giggles I'll make a backup of the forum now, I can leave this running until I've worked out a way to pull the rest of the site.

>>32285
we need to save BUI's chanfaggotry (if this is you BUI then no offence here m8)
so we can show everyone the truth

>>32284 you mean archive.is

>>32191
>Ahh, a fellow Aussie :)
Well now I'm just scared you'll recognize me.
(Full transparency if you do: Not a furry, just sexually aroused by large amounts of data)
>it's always interesting seeing people complain about the donations, its feels expensive but the hosting actually seems fairly robust and reasonable for that price
Compared to what hosting cost even 5 years ago, 12TB of storage with unlimited bandwidth for $160 a month is crazy.
You could hardly even get a redundant couple terabytes with a few TB of bandwidth for that not long ago.
>and for that amount of money nobody would develop and maintain this site if they were after a profit, it's definitely somebody's hobby.
It's the sorta thing I'd build and run on my home server, and if anyone but me wanted to use it, well, they'd all have to share my 1MBit/s upload.
(Why I find somewhere faster if I need to upload a few TB's)
>checkout 'ozbargain' for deals specific to Aus if you haven't
Usually I see the deals there after they've finished, I'll see if I can set up a notification or something anytime hard drives pop up there.
A SAN is good, but so is some nice raw disks.
>I'm in the process of just archiving straight to the cloud, the 'external' sites that I archive never hit my local storage
A word of warning, but treat all those accounts as stopgaps at best.
I recently had my Mega account YEET'd just because I imported an unsavory link (a certain PDF), because I archive everything. No Bueno.
Be warned, and if you're using Mega, never import any links ever on your archival account.
If you wanna be safe, start a google cloud instance ($300 free credit on signup) and locally mount an old and new mega account, and just transfer over. Don't even use their link system to import, not worth even having used it once.
Archiving to cloud accounts is like taking out credit cards to pay off other credit cards.

>>32289
xubuntu is ready now
how do i contribute i still have 1tb free on my external HDD

>>32289
hopefully some cloud providers accept btc here

>>32191
>For what I want to do these are my requirements:
>- I want all the post data and metadata (minus comments)
>- I want the archive to be actually browseable
At first I'd rather get all content down and worry about metadata later, but it'd be nice to get it all in one pass.
My biggest issue would be 404's and going back and getting un-404'd posts, along with removal of non-404 posts which where completely removed from the system (user uploaded spam)
If keeping the spam means getting everything, I wouldn't mind, but something that did it all would be good.

>I can probably help you hack something together if you're commited to keeping an up to date full backup of this site
I can guarantee that I'll keep a full archive of the site with a mirrored backup, and that I would upload to it archive.org if this site ever went down, however I couldn't promise that it'd be browsable in the meantime as I don't have the upload speed to spare. Borrowing a high speed link for a couple days to throw it up is no problem, but in the meantime, people could just use the site itself.
It'd probably be set-and-forget for me, once the full site was down, but I'd be happy to update the code if you added any features.

>Oh, and I suspect you might be on https://www.reddit.com/r/DataHoarder/ ?
>If not, you should check it out
I've had my fair share of laser cut plywood shuckmounts, don't you worry.

>>32288
>>32284 you mean archive.is
No, I meant .org
If .is has a copy, that'd be great, but exportability is a concern, and .is drops a shitload of the original data.
>>32292
>how do i contribute i still have 1tb free on my external HDD
Buy a pair of 10TB disks on ebay.

>>32192
>I can probably help you hack something together if you're commited to keeping an up to date full backup of this site
On this, if you'd rather work privately instead of posting your code publicly, my email address starts with "velleity" and is hosted on (or, I guess you could say "At") "outlook.com.au"
I'm a little busy for the next couple days (hence why I'm still posting at 4AM), but when I get home each day I'll be able to spend a couple hours working on it to get things rolling.

speaking of 300 how about 300 tb will that be OK

>>32300
Enough to backup the site 25 times yeah
(302 get maybe?

>>32302
GOT EEEM!)

>>32302
with that amount of data this isn't the only site youll copy

I haven't heard from Archivist, if you're still there I'm still quite interested in doing a full backup.
velleity, something something AT, something something outlook.com.au
Email me nibba

>>33429
cmon maan just gib me da email (just seperate the at and .au)

>>33435
Bullshit, I'm not posting anything even remotely bot readable (fuck, that's just tempting some faggot to spam it everywhere)

>>33723
literally just create 2 protonmail adresses seriously tho they are free
also you were right about the spam i have it too

>>33723
txt to image works fine too
also use a captcha generator with the address as text
if paranoid use text to speech

one day left

at this point more like 11TB, and that is also the reason for donation spike

I could guarantee a full archive up to 20190516, but lately there are too many PSDs and other big files that were added, so the catch up is slightly slow

would've donated, but the whole crypto trading is a pain with verification and what not, and more sites look shady than trustworthy
since I can guess how the whole thing is set up, I wouldn't mind lending the storage and reducing the costs, wonder if admin would agree on it
when a server mounts that storage there shouldn't be any visible information about the mounting server, but you never know

Once you extract it you should put it into hydrus
https://hydrusnetwork.github.io/hydrus/

>>34348
okay boomer

>>49792
>okay boomer to new software
you killing the dead horse even more.

Also please make sure to download the MEGA links too, so we don't lose those if they get taken down by the authors.

bump

I'm trying to make a full backup of the furry content (only) on this website. I'm excluding avi and video files, however i'm still including everything else, most embeds and also 100% of post metadata.

So far I'm halfway (counting the number of files) there.

Let’s not let the site shut down and donate now

https://hydrusnetwork.github.io/hydrus/

Hello, I am interested in taking a full site backup from someone here who already has a nice backup going. I have plenty of free space. Any of the old posters still here that I could reach out and contact? Already got that email on lock.

No, if anything it should be shared over bittorrent.

>>50868
Do you have matrix anon?

https://matrix.to/#/@foxo:foxo.me

What's the recommended scraper to archive the site? I'd want to help if possible. I've got around 10TB sitting around doing nothing.

>>50952
The issue is there are multiple class links now.
There is class= Card-Attachments, class= Card Posts, class= Attachments, The shared library ect ect.
On top of that each patreon user has their own ways to upload. Some use Google Drive, some use Rar on random host sites, some used shared to link other sites.
Some just host on the Card attachments. You also have to take into consideration about 100 file formats from .Gif to .MP4 to .PNG to .PSD and .Rar ect ect ect

So scraping becomes extremely difficult as there as so many things you need to consider for even ONE creator let alone thousands.
If you want to try it out I say pick a date and time, any new uploads or updates after that date and time you forget about. You can't possibly keep a live backup of the site so start from when ever date and forget new uploads.

>>50864
this

>>50954
So, in short, step one is to download all attachments/embeds in patreon posts, which, as far as I remember, is essentially limited to images, and then parse the text of every post for links, then automatically download the content of those links.
I think at this stage the goal is simply duplicating yiff.party, so if a post just had text containing a link, then all that would be archived is that post, as text, containing a link.
Going and downloading what the link is pointing to seems secondary. Perhaps a feature to add into yiff.party itself sometime.

>>51014
Building on that the best way is to find the urls of all patreon users on this site.
After you have found that you need a link extractor to extract Urls from webpages. If you have one that can automate every URL in a text file it will save you hundreds of hours.

Essentially you want your URLs to look like what https://hackertarget.com/extract-links/ presents in its extracted links
You are looking for 1 of 2 specific lines out of every link. Either "https://yiff.party/patreon_data/......" and/or "https://yiff.party/patreon_inline/...."
Patreon_data and Patreon_inline store every single download file. You don't even need to care about extensions.
All download files are located in these 2 lines. All .jpg .png .psd .mp4 ect ect ect are in them so don't worry about what types of files it will download.

Just as a heads up, every creator on the site will have "https://yiff.party/patreon_data/" to download files.
Only some have "https://yiff.party/patreon_data/" AND "https://yiff.party/patreon_inline/" so I wouldn't worry about the patreon_inline downloads.

And don't forget https://yiff.party/shared_data/

List of all furry files:
https://foxo.me/all.csv.xz

>>50900
Telegram > Discord > Matrix > Slack

>>52367
Thanks for your opinion

>>52414
Telegram > Discord Opsec wise, Matrix is good in Opsec but bad in usability, and Slack is an all-round dud.

>>51174
I'm getting a 403 error.
This is just an additional location that data is stored in, right? Nothing more than https://yiff.party/patreon_data/" AND "https://yiff.party/patreon_inline/"?

File