/t/ - Technology

Discussion of Technology

Index Catalog Archive Bottom Refresh
Mode: Reply
Options
Subject
Message

Max message length: 8000

Files

Max file size: 32.00 MB

Max files: 5

Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password

(used to delete files and postings)

Misc

Remember to follow the rules

The backup domain is located at 8chan.se. .cc is a third fallback. TOR access can be found here, or you can access the TOR portal from the clearnet at Redchannit 2.0.



8chan.moe is a hobby project with no affiliation whatsoever to the administration of any other "8chan" site, past or present.

(43.05 KB 618x656 ChannelChangerLogo_Avatar.png)
(133.69 KB 1510x924 ChannelChangerLogo.png)
ChannelChanger Development & Support Anonymous 09/06/2020 (Sun) 18:37:48 No. 1257
This is the official development and support thread for ChannelChanger. Please request help, post bugs, or offer suggestions here. What is ChannelChanger? A cross-platform, multi-site scraper and importer. It allows anyone to back up a board and then import it to their own website. https://gitgud.io/Codexx/channel_changer What do I need to run this? Python 3.8+ and most of the dependencies listed in requirements.txt. A basic set-up guide is provided in the readme. This software was developed and tested exclusively on Linux. I intend to support both OSX and WIndows. If you use either of these platforms and encounter any issues, please let me know. Can I scrape a board from [site] with this? Probably. There is explicit support for LynxChan, Vichan, and JSChan websites. Some vichan sites may have issues with thumbnails because their APIs do not expose thumbnail extensions; I have added an override but you may need to run two scrapes of boards on some sites to get all of the thumbnails. Vichan's API matches 4chan's with some extensions, so the scraper might work on other sites which clone the 4chan API, but this is untested. Many vichan sites have customized frontends, such as OpenIB, Lainchan, or Kissue. I've tested and confirmed these work, but can't always guarantee full compatibility with each of these, especially if they decide to alter the API or where files are stored. LynxChan sites should work fine, since the direct path for both the thumbnail and the file are in the JSON. JSChan works, but its API is presumably unstable. if it changes, please alert me and I will make the necessary tweaks. Can I import these boards to my own website? Sure, but for the moment only importing LynxChan boards from LynxChan or Vichan sites has any support. Importing is currently undergoing a heavy refactor. Once it is done, it will be possible to import from any board to a LynxChan website. Imports to other imageboard engines are planned. Can I view the board offline? Easily? No, but I am looking into an option to do this. You will have a local copy of the threads and files, but the data is not modified for local viewing. I will continue to iterate and refactor. The code is a bit of a mess at the moment, but I plan to simplify it and make it PEP8-compliant soon. It's very likely there's still some big kinks to work out. Your feedback is incredibly valuable!
(43.59 KB 467x413 nice desu ne (2).jpg)
>>1257 Can the average anon scrape boards, or do they need to own the target board/site in order to scrape everything? How much strain does this cause in the target site?
>>1259 It just scrapes what is publicly available. Anyone can do it on any supported site. It should get everything; let me know if it chokes or misses anything. No more of a strain than a single anon clicking on and reading every single thread on the board and expanding all images. That's not much of a hit. It also only scrapes files that are missing, so that will minimize scrape time and server bandwidth.
>channel Christ >There is explicit support for JSChan Ok this is based. Blacked.moe is ultra gay but you're cool codexx.
>>1257 So basically it lets you copy and paste boards?
>>1260 >That's not much of a hit Care to elaborate? You keep comparing this tool with various 4chan scrappers, but 4chan's servers are dozens of times higher than any other imageboard's and probably have features like load balancing to reduce the strain. If configured wrong, a tool like your scrapper could take down a small site by accident. Another difference between this tool and things like 4chan's archives is that they use 4chan's dedicated API, which limits the amount of content they can scrape (just the text and images in the posts instead of requesting the entire page every time something changes), while your tool requests and downloads everything (at least judging by your description). There are related ethic problems, like bad actors using your tool to clone a site for malicious purposes (stealing users, scamming advertisers, bloating the target's bandwidth, etc), but that's something to be expected with this kind of tools.
>>1263 >site performance problems Assuming a good faith actor, the worst-case would be transferring a copy of every file on the server and a copy of each thread. That sounds like a lot, and it is, but even small sites (such as this one and the webring) handle that kind of data transfer regularly. Vanwa removed a lot of their performance and traffic statistics, but this site fulfills hundreds of requests a second even during off-hours. Even our $5 VPS test server handles being scraped just fine, although it's not also handling other traffic. In terms of bad actors, yes, you could just have this tool constantly make requests. But a DDoS tool would be far more effective and use up less of your own bandwidth in the process. I just don't think it's well-suited for this purpose. In short, I don't think it can take down sites by accident unless used maliciously, and someone with malicious intent has better tools to accomplish the same thing. >4chan dedicated API I actually don't know what 4chan scrapers are out there and haven't compared this one to them. I only know Vichan's API is an superset of 4chan's. Everything I grab is from the public, dedicated JSON API these sites provide, and then I also grab the HTML pages on top of that for the sake of posterity (and I think I might need them when I write the vichan importer). It will re-request a page on re-scrape, but the size of a JSON or HTML request is peanuts and it will not re-scrape files. >related ethic problems... bad actors... clone sites for malicious purposes Yes, I am concerned with this, too. Primarily, angry users forking boards because the mods deleted their post or pissed them off in some way. But my tool doesn't do anything these people couldn't do themselves. It only lowers the barrier to entry a bit. The alternative is to keep the source closed and just advertise it to migrating boards as an option, but I do believe in free software and I also believe that, should I get hit by a bus tomorrow, other anons should have the ability to restore boards and sites that get deplatformed. This tool gives them the capability to do that. It's up to anons to not be "stolen" by other websites just because the posts are cloned. Ultimately, it's how you use it. But I think it would be unethical to just keep this tool private. >>1262 Yes.
>>1264 Thanks for the clarification. I was a bit worried about how this tool could backfire and cause more damage than good. One last thing, is there a risk of accidentally triggering Vanwa/Cloudflare's DDoS protection and getting your IP banned by them for using this tool?
>>1265 Potentially, but there's some mitigation. The user-agent is spoofed to Firefox and if people run into issues I could implement user-agent randomization on a per-request basis, which would probably throw this off. Cloudflare's bot-check page would likely be an issue if enabled. Nothing to be done about that, really. I ran into throttling with 8kun very early on, but since implementing the user-agent spoofing it hasn't been an issue, and I scraped two of their largest boards back-to-back multiple times.
>>1266 >Cloudflare's bot-check page would likely be an issue if enabled. Nothing to be done about that, really. There are two bot check pages. The first is a simple matter of waiting and executing some javascript. This only stops the lowest effort automation. The second involves a captcha, where your options are to either solve it or try again with a different IP address.
>>1259 Ask any of the 4chan archivers for details.
As a turbo CL linux brainlet someone explain what I'm doing wrong. python change-channel.py -s 8chan.moe -b test -o test File "change-channel.py", line 92 print('Unable to download ' + ('thumbnail' if thumbnail else 'image') + f' {url.split("/")[-1]}. Skipping...') ^ SyntaxError: invalid syntax
>>1271 sometimes invoking python on some gnu/linux distros grabs python2 rather than python3, you can state python3 & double check with the '-v' flag (python3 -v not python3 change-channel.py -s 8chan.moe -b test -o test -v)
>>1272 Looks like my nigger distro doesn't have python 3.8, probably the issue
>>1271 >>1272 >>1273 That's a syntax error on an f-string, so it does indeed look like the issue is Python 3.8 is not installed, or at least isn't being called with your python3 alias. Debian and Ubuntu can use the Deadsnakes PPA, but they only package 3.9 because you can technically get 3.8 with a dist-upgrade. You can also compile from scratch. It's a straightforward process. Just make note of where the binary is installed. Bear in mind you'll need to install pip and dependencies per-version of Python. Using a virtual environment helps a lot with managing this. Please let me know if you need any further help.
>>1274 I compiled python 3.8 and after getting each module it bitched at me about I got it to work. Probably went about this all the wrong way but in the end it worked. Works on some boards/sites fine but getting errors for others on zzzchan like this python3.8 change-channel.py -s zzzchan.xyz -b b -o b Threads |██▊⚠ | (!) 3/42 [7%] in 1.8s (1.68/s) Traceback (most recent call last): File "change-channel.py", line 600, in <module> scrapeJschanBoard(args, json_catalog, output_root) File "change-channel.py", line 233, in scrapeJschanBoard for result in results: File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 865, in next raise value File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "change-channel.py", line 194, in scrapeThread thumb_loc = f'/file/thumb-{file_data["hash"]}{file_data["thumbextension"]}' KeyError: 'thumbextension'
>>1276 I thought I fixed that; turned out I forgot to apply the fix to threads and not just posts. I've pushed an update which should rectify the issue. It should now just fail to download thumbnails for those files. Not very elegant, and I'd like to revisit it eventually, but I was able to complete an entire scrape with no errors. I show 2,343 images in the src/ folder for z/v/, compared with the original list of 2,396. If you can find anything missing that is available on the site, let me know and I will investigate. Thank you for the bug report!
>>1279 Seems to just hang on lynxchan.net boards and doesn't do anything.
>>1280 Their site uses the www subdomain and redirects requests without it, but their certificate only covers URLs with the subdomain included. I actually encountered a similar issue with some of lainchan's alternate domains, which the certificate is not properly configured for. I almost removed the validation check entirely, but the requests library screams bloody murder about insecure requests, so I reverted. I am able to scrape as long as I include the www in the site argument. For example: ./change-channel -s www.lynxchan.net -b lynx -o lynx -j 8 Most hangs are failure to resolve the URL, although some are occasionally caused by multithreaded scrapes failing to release locks. I'll see if I can't add an explicit error when this happens, though.
>>1281 Figured that was it but I guess I unnecessarily added the https shit too which threw me for a loop.
>>1257 What about using it through Tor? >inb4 scraping through Tor bad not anymore, Tor now holds many connections easily
>>1286 I haven't tested it. But assuming you're routing all traffic through Tor, I don't see why a request would fail. As long as your network can resolve an address and handle requests/responses, it should work. Give it a try and if there are any issues you can report them.
JSchan importing when
>>1465 I'm focusing on the next LynxChan upgrade right now. I'll pick up development of ChannelChanger once that is done.
>>1475 Okay thank you lain tranny
>>1475 Is there a rough estimate for when this will be done? Not trying to pester just need to account for potential board migrations and when/how they might occur. I wouldn't expect you to prioritize off-site migrations, or care much at all about them.
>>1538 LynxChan 2.5 RC1 launches October 17th. It will be no sooner than that. With some luck, the update will happen at the end of the month. Is this regarding migrating /r9k/ from the zchan database to zzzchan? If so, I was planning to reach out to Sturgeon about that once my plate was clear. If this is regarding another move, or if you're the admin of a different JSChan site, I'd encourage you to reach out to me via e-mail; I will need a few active JSChan users to lend me their eyes for bug hunting.
>>1539 >Is this regarding migrating /r9k/ from the zchan database to zzzchan? It is about migrating /r9k/, but from lynxchan.net to zzzchan. Unless I'm mistaken sturgeon does not have access to the original zchan database. A month from now is a better estimate than I was expecting. If you think it'll be possible in November it may be worth waiting.
>>1540 A month from now is when I start working on ChannelChanger actively again. If you want the posts from zchan and the admin is willing to either share the backup or host it long enough to grab a scrape then you could also import those. Merging the zchan and lynxchan boards wouldn't be straightforward, but would be possible. Either way, I'll try to get it done by the end of November. Can't promise anything, though. I'd be willing to help with the migration when it does happen. I'd still recommend you contact me via e-mail; board owners have a good sense for when something on their board is broken, and since I haven't imported to JSChan before I think having you look over everything beforehand would be best.
>>1542 Alright, I'll have a think on it for awhile.
I've pushed an update which allows users to ignore invalid certificates. This will solve both the invalid certificate after redirect issue as well as sites being inaccessible due to expired credentials.
(86.47 KB 1024x1024 1586634477331.jpg)
Yet another project that seems interesting, written in that fucking retarded faggot shit language. Fuck.
>>1597 Using a mediocre language to get shit done is still miles better than getting nothing done even while endlessly trying to put others down as they fail to meet your impossible standards, standards nobody cares for outside of you.
>>1597 What other better language could be used to do something like this with ease?
>>1598 >impossible standards Choosing a non- or less- retarded and/or gay language to do a project is not impossible. >trying to put others down Not trying to put anyone else down mate, just lamenting the fact. >nobody cares for outside of you Only few do care, and that's the main reason why things are how they are. >>1600 Ruby or Perl could be used to do something like this with ease. The former is less pozzed, the latter virtually AIDS-free.
>>1617 This is possible on LynxChan sites because of the "transfer threads" feature which can move threads between boards and dynamically re-numbers them. Building this into the tool itself for use on any website would be a pain, but it's theoretically doable.
>The 9th Circuit has defended the right to scrape publicly-accessible data what utter faggotry it's up to the server admin to permit that
>>1542 >>1545 I'm going to be opting to migrate r9k to zzzchan sooner rather than later. So don't concern yourself over time frames for development or whatever, you won't have me waiting on you.
The political posts are off-topic and have been moved to >>1631.
(378.66 KB 400x358 sonic the hedgehog.gif)
I have thusfar failed to install Python on my Linux machine such that it will allow the Lynxchan board export script to function. Is a more portable version of that utility upcoming, or should I continue hacking away at it?
Can you explain your problem exactly? How have you tried to install it, and what issues are you having with it? I do plan to make a standalone executable version eventually, but I want it to be feature-complete first and importing still needs a revamp. Also, I'm going to move this thread to the ChannelChanger general on >>>/t/ after you reply again. Sonic Jam was great
Your gif is actually a fucking webp. You just named it .gif, retard.
Merged the thread from >>>/site/ into here. >>1995 If you continue to have issues with installing Python then I can help if you provide more information.
>>1257 So is the project dead? I was waiting for the importing to Vichan feature, that would allow to archive any Vichan board and make a read-only version.
Pushed an update to rectify an oversight in DB versioning. File hashes were updated but one of the identifier fields was not. >>2099 No, but between the holidays, the main site, the streaming site, and some other projects my hands have been full. I released this when it hit a minimum viable state. I will need to learn the layout of the Vichan database, possibly accounting for the most popular forks. If you are familiar with administrating Vichan servers, please drop me a line. You may be able to expedite the process. >>2125 Are you facing difficulty scraping from Cloudflare protected sites? If so, let me know which sites are causing a problem. Even on a VPN I've been able to take test scrapes from every webring site.


Quick Reply
Extra
Delete
Report

no cookies?