search

As part of theHTTP Archiveproject, I create MySQL dumps for each crawl (on the 1st and 15th of each month). You can access the list of dumps from thedownloads page. Several people use these dumps, most notablyIlya Grigorik who imports the data into Google BigQuery.

For the last year I’ve hesitated on many feature requests because they require schema changes. I wasn’t sure how changing the schema would affect the use of the dump files that preceded the change. This blog post summarizes my findings.

Format

When I started the HTTP Archive all the dumps were exported in MySQL format using a command like the following:

mysqldump --opt --skip-add-drop-table -u USERNAME -p -h SERVER DBNAME TABLENAME | gzip > TABLENAME.gz

These MySQL formatted dump files are imported like this:

gunzip -c TABLENAME.gz | mysql -u USERNAME -p -h SERVER DBNAME

People using databases other than MySQL requested that I also export in CSV format. The output of this export command is two files: TABLENAME.txt and TABLENAME.sql. The .txt file is CSV formatted and can be gzipped with a separate command.

mysqldump --opt --complete-insert --skip-add-drop-table -u USERNAME -p -h SERVER -T DIR DBNAME TABLENAMEgzip -c DIR/TABLENAME.txt > DIR/TABLENAME.csv.gz

This CSV dump is imported like this:

gunzip DIR/TABLENAME.csv.gzmysqlimport --local --fields-optionally-enclosed-by="/"" --fields-terminated-by=, --user=USERNAME -p DBNAME DIR/TABLENAME.csv

The largest HTTP Archive dump file is ~25G unzipped and ~3G gzipped. This highlights a disadvantage of using CSV formatted dumps: there’s no way to gzip and ungzip in memory. This is because themysqlimport command uses the filename to determine which table to use – if you piped in the rows then it wouldn’t know the table name. Unzipping a 25G file can be a challenge if disk space is limited.

On the other hand, the CSV import is ~30% faster than using the MySQL format file. This can save over an hour when importing 30 million rows. The HTTP Archive currently provides dumps in both MySQL and CVS format so people can choose between less disk space or faster imports.

Forward Compatibility

My primary concern is with the flexibility of previously-generated dump files in light of later schema changes – namely adding and dropping columns.

Dump files in MySQL format work fine with added columns. The INSERT commands in the dump are tied to specific column names, so the new columns are simply ignored. CSV formatted dumps are less flexible. The values in a row are stuffed into the table’s columns in order. If a new column is added at the end, everything works fine. But if a column is added in the middle of the existing columns, the row values will all shift one column to the left.

Neither format works well with dropped columns. MySQL formatted files will fail with an “unknown column” error. CSV formatted files will work but all the columns will be shifted, this time to the right.

Takeaways

I now feel comfortable making schema changes without invalidating the existing dump files provided I follow these guidelines:

  • don’t drop columns– If a column is no longer needed, I’ll leave it in place and modify the column definition to be a small size.
  • add columns at the end– I prefer to organize my columns semantically, but all new columns from this point forward will be added at the end.

I’ll continue to create dumps in MySQL and CSV format. These guidelines ensure that all past and future dump files will work against the latest schema.

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
How to solve the problem of mysql cannot open shared libraryHow to solve the problem of mysql cannot open shared libraryMar 04, 2025 pm 04:01 PM

This article addresses MySQL's "unable to open shared library" error. The issue stems from MySQL's inability to locate necessary shared libraries (.so/.dll files). Solutions involve verifying library installation via the system's package m

Reduce the use of MySQL memory in DockerReduce the use of MySQL memory in DockerMar 04, 2025 pm 03:52 PM

This article explores optimizing MySQL memory usage in Docker. It discusses monitoring techniques (Docker stats, Performance Schema, external tools) and configuration strategies. These include Docker memory limits, swapping, and cgroups, alongside

How do you alter a table in MySQL using the ALTER TABLE statement?How do you alter a table in MySQL using the ALTER TABLE statement?Mar 19, 2025 pm 03:51 PM

The article discusses using MySQL's ALTER TABLE statement to modify tables, including adding/dropping columns, renaming tables/columns, and changing column data types.

Run MySQl in Linux (with/without podman container with phpmyadmin)Run MySQl in Linux (with/without podman container with phpmyadmin)Mar 04, 2025 pm 03:54 PM

This article compares installing MySQL on Linux directly versus using Podman containers, with/without phpMyAdmin. It details installation steps for each method, emphasizing Podman's advantages in isolation, portability, and reproducibility, but also

What is SQLite? Comprehensive overviewWhat is SQLite? Comprehensive overviewMar 04, 2025 pm 03:55 PM

This article provides a comprehensive overview of SQLite, a self-contained, serverless relational database. It details SQLite's advantages (simplicity, portability, ease of use) and disadvantages (concurrency limitations, scalability challenges). C

How do I configure SSL/TLS encryption for MySQL connections?How do I configure SSL/TLS encryption for MySQL connections?Mar 18, 2025 pm 12:01 PM

Article discusses configuring SSL/TLS encryption for MySQL, including certificate generation and verification. Main issue is using self-signed certificates' security implications.[Character count: 159]

Running multiple MySQL versions on MacOS: A step-by-step guideRunning multiple MySQL versions on MacOS: A step-by-step guideMar 04, 2025 pm 03:49 PM

This guide demonstrates installing and managing multiple MySQL versions on macOS using Homebrew. It emphasizes using Homebrew to isolate installations, preventing conflicts. The article details installation, starting/stopping services, and best pra

What are some popular MySQL GUI tools (e.g., MySQL Workbench, phpMyAdmin)?What are some popular MySQL GUI tools (e.g., MySQL Workbench, phpMyAdmin)?Mar 21, 2025 pm 06:28 PM

Article discusses popular MySQL GUI tools like MySQL Workbench and phpMyAdmin, comparing their features and suitability for beginners and advanced users.[159 characters]

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Repo: How To Revive Teammates
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)