Wikipedia talk:Database download

This is the talk page for discussing improvements to the Database download page.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Archives: 1, 2, 3: 12 months

Please note that questions about the database download are more likely to be answered on the xmldatadumps-l or wikitech-l mailing lists than on this talk page.

Archives

Archive 1

Archive 2

Archive 3

This page has archives. Sections older than 365 days may be automatically archived by Lowercase sigmabot III when more than 4 sections are present.

Semi-protected edit request on 1 March 2023

[edit]

This edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request.

Unfortunately the link to the wiki-as-ebook is no longer available. Link needs to be removed.

E-book The wiki-as-ebook store provides ebooks created from a large set of Wikipedia articles with grayscale images for e-book-readers (2013). Tibor Brink (talk) 15:44, 1 March 2023 (UTC)[reply]

Done -- John of Reading (talk) 16:34, 1 March 2023 (UTC)[reply]

Update: this is still listed as an option in the "Offline Wikipedia Reader" list at the top, and that should be removed as well. — Preceding unsigned comment added by Dfhci (talk • contribs) 14:09, 17 March 2024 (UTC)[reply]

@Dfhci: Also

Done -- John of Reading (talk) 15:45, 17 March 2024 (UTC)[reply]

How to use multistream?

[edit]

The "How to use multistream?" shows

" For multistream, you can get an index file, pages-articles-multistream-index.txt.bz2. The first field of this index is the number of bytes to seek into the compressed archive pages-articles-multistream.xml.bz2, the second is the article ID, the third the article title.

Cut a small part out of the archive with dd using the byte offset as found in the index. You could then either bzip2 decompress it or use bzip2recover, and search the first file for the article ID.

See https://docs.python.org/3/library/bz2.html#bz2.BZ2Decompressor for info about such multistream files and about how to decompress them with python; see also https://gerrit.wikimedia.org/r/plugins/gitiles/operations/dumps/+/ariel/toys/bz2multistream/README.txt and related files for an old working toy.

I have the index and the multistream, and I can make a live usb flash drive with https://trisquel.info/en/wiki/how-create-liveusb

lsblk

umount /dev/sdX*

sudo dd if=/path/to/image.iso of=/dev/sdX bs=8M;sync

,but I do not know how to use dd that well to

"Cut a small part out of the archive with dd using the byte offset as found in the index." than "You could then either bzip2 decompress it or use bzip2recover, and search the first file for the article ID. "

Is there any video or more information on Wikipedia about how to do this, so I can look at Wikipedia pages, or at least the text off-line?

Thank you for your time.

Other Cody (talk) 22:46, 4 December 2023 (UTC)[reply]

https://trisquel.info/en/forum/how-do-you-cut-wikipedia-database-dump-dd

has someone called Magic Banana who has information about how to do this.

Maybe others as well. Other Cody (talk) 15:44, 26 January 2024 (UTC)[reply]

A tool for a similar multistream compressed file was written for xz compression and lives at https://github.com/kamathln/zeex . This will give a preliminary idea and could be adapted for bz2 as well. kamathln (talk) 12:21, 22 January 2025 (UTC)[reply]

Download Wikipedia to PC offline

[edit]

i would like ask to all of you how i can download the wiki for PC. 152.206.175.101 (talk) 19:35, 20 April 2024 (UTC)[reply]

How many "multiple" is "These files expand to multiple terabytes of text." - 4TB Drives are...

[edit]

...cheap as chips.

In early 2025, a 4TB disk drive is $70 USD while SSD is just $200, and 24 TB Discs are under $500...

It's clear that the "current version only" expands to just 0.086 TB - Can anyone further clarify whether "multiple" a few lines below that is talking about expanding to 2 TB or 200 TB? Jonathon Barton (talk) 06:17, 16 February 2025 (UTC)[reply]