Migrating Devnet/Mainnet Archive to Berkeley Archive
Before you start the process to migrate your archive database from the current Mainnet or Devnet format to Berkeley, be sure that you:
- Understand the Archive Migration
- Meet the foundational requirements in Archive migration prerequisites
- Have successfully installed the archive migration package
Migration process
The Devnet/Mainnet migration can take up to a couple of days. Therefore, you can achieve a successful migration by using three stages:
Stage 1: Initial migration
Stage 2: Incremental migration
Stage 3: Remainder migration
Each stage has three migration phases:
Phase 1: Copying data and precomputed blocks from Devnet/Mainnet database using the berkeley_migration app.
Phase 2: Populating new Berkeley tables using the replayer app in migration mode
Phase 3: Additional validation for migrated database
The source database with original Devnet/Mainnet data
The migrated database with original Devnet/Mainnet data converted to the Berkeley schema
Review these phases and stages before you start the migration.
Simplified approach
For convenience, use the berkeley_migration.sh
script if you do not need to delve into the details of migration or if your environment does not require a special approach to migration.
Stage 1: Initial migration
mina-berkeley-migration-script \
initial \
--genesis-ledger ledger.json \
--source-db postgres://postgres:postgres@localhost:5432/source \
--target-db postgres://postgres:postgres@localhost:5432/migrated \
--blocks-bucket mina_network_block_data \
--blocks-batch-size 50 \
--checkpoint-output-path .\
--precomputed-blocks-local-path .\
--network NETWORK
where:
-g | --genesis-ledger
: path to the genesis ledger file
-s | --source-db
: connection string to the database to be migrated
-t | --target-db
: connection string to the database that will hold the migrated data
-b | --blocks-bucket
: name of the precomputed blocks bucket. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
-bs | --blocks-batch-size
: number of precomputed blocks to be fetched at one time from Google Cloud. A larger number, like 1000, can help speed up the migration process.
-n | --network
: network name (devnet
or mainnet
) when determining precomputed blocks. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
.
-c | --checkpoint-output-path
: path to folder for replayer checkpoint files
-l | --precomputed-blocks-local-path
: path to folder for on-disk precomputed blocks location
The command output is the migration-replayer-XXX.json
file required for the next run.
Stage 2: Incremental migration
mina-berkeley-migration-script \
incremental \
--genesis-ledger ledger.json \
--source-db postgres://postgres:postgres@localhost:5432/source \
--target-db postgres://postgres:postgres@localhost:5432/migrated \
--blocks-bucket mina_network_block_data \
--blocks-batch-size 50 \
--network NETWORK \
--checkpoint-output-path . \
--precomputed-blocks-local-path . \
--replayer-checkpoint migration-checkpoint-XXX.json
where:
-g | --genesis-ledger
: path to the genesis ledger file
-s | --source-db
: connection string to the database to be migrated
-t | --target-db
: connection string to the database that will hold the migrated data
-b | --blocks-bucket
: name of the precomputed blocks bucket. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
-bs | --blocks-batch-size
: number of precomputed blocks to be fetched at one time from Google Cloud. A larger number, like 1000, can help speed up migration process.
-n | --network
: network name (devnet
or mainnet
) when determining precomputed blocks. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
.
-r | --replayer-checkpoint
: path to the latest checkpoint file migration-checkpoint-XXX.json
-c | --checkpoint-output-path
: path to folder for replayer checkpoint files
-l | --precomputed-blocks-local-path
: path to folder for on-disk precomputed blocks location
Stage 3: Remainder migration
mina-berkeley-migration-script \
final \
--genesis-ledger ledger.json \
--source-db postgres://postgres:postgres@localhost:5432/source \
--target-db postgres://postgres:postgres@localhost:5432/migrated \
--blocks-bucket mina_network_block_data \
--blocks-batch-size 50 \
--network NETWORK \
--checkpoint-output-path . \
--precomputed-blocks-local-path . \
--replayer-checkpoint migration-checkpoint-XXX.json \
-fc fork-genesis-config.json \
-f fork-state-hash
where:
-g | --genesis-ledger
: path to the genesis ledger file
-s | --source-db
: connection string to the database to be migrated
-t | --target-db
: connection string to the database that will hold the migrated data
-b | --blocks-bucket
: name of the precomputed blocks bucket. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
-bs | --blocks-batch-size
: number of precomputed blocks to be fetched at one time from Google Cloud. A larger number, like 1000, can help speed up the migration process.
-n | --network
: network name (devnet
or mainnet
) when determining precomputed blocks. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
.
-r | --replayer-checkpoint
: path to the latest checkpoint file migration-checkpoint-XXX.json
-c | --checkpoint-output-path
: path to folder for replayer checkpoint files
-l | --precomputed-blocks-local-path
: path to folder for on-disk precomputed blocks location
-fc | --fork-config
: fork genesis config file is the new genesis config that is distributed with the new daemon and is published after the fork block is announced
-f | --fork-state-hash
: fork state hash
Advanced approach
If the simplified berkeley migration script is, for some reason, not suitable for you, it is possible to run the migration using the berkeley_migration and replayer apps without an interface the script provides.
Stage 1: Initial migration
This first stage requires only the initial Berkeley schema, which is the foundation for the next migration stage. This schema populates the migrated database and creates an initial checkpoint for further incremental migration.
Inputs
- Unmigrated Devnet/Mainnet database
- Devnet/Mainnet genesis ledger
- Empty target Berkeley database with the schema created, but without any content
Outputs
- Migrated Devnet/Mainnet database to the Berkeley format from genesis up to the last canonical block in the original database
- Replayer checkpoint that can be used for incremental migration
Phase 1: Berkeley migration app run
mina-berkeley-migration \
--batch-size 1000 \
--config-file ledger.json \
--mainnet-archive-uri postgres://postgres:postgres@localhost:5432/source \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--blocks-bucket mina_network_block_data \
--precomputed-blocks-local-path . \
--keep-precomputed-blocks \
--network NETWORK
where:
--batch-size
: number of precomputed blocks to be fetched at one time from Google Cloud. A larger number, like 1000, can help speed up migration process.
--config-file
: path to the genesis ledger file
--mainnet-archive-uri
: connection string to the database to be migrated
--migrated-archive-uri
: connection string to the database that will hold the migrated data
--blocks-bucket
: name of the precomputed blocks bucket. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
--precomputed-blocks-local-path
: path to folder for on-disk precomputed blocks location
--keep-precomputed-blocks
: keep the precomputed blocks on-disk after the migration is complete
--network
: the network name (devnet
or mainnet
) when determining precomputed blocks. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
Phase 2: Replayer in migration mode run
Replayer config must contain the Devnet/Mainnet ledger as the starting point. So first, you must prepare the replayer config file:
jq '.ledger.accounts' genesis_ledger.json | jq '{genesis_ledger: {accounts: .}}' > replayer_input_config.json
where:
genesis_ledger.json
is the genesis file from a daemon bootstrap on a particular network
Then:
mina-migration-replayer \
--migration-mode \
--archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--input-file replayer_input_config.json \
--checkpoint-interval 10000 \
--checkpoint-output-folder .
where:
--migration-mode
: flag for migration
--archive-uri
: connection string to the database that will hold the migrated data
--input-file
: path to the replayer input file, see below on how's created
replayer_input_config.json
: is a file constructed out of network genesis ledger:
jq '.ledger.accounts' genesis_ledger.json | jq '{genesis_ledger: {accounts: .}}' > replayer_input_config.json
--checkpoint-interval
: frequency of checkpoints file expressed in blocks count
--checkpoint-output-folder
: path to folder for replayer checkpoint files
Phase 3: Validations
Use the berkeley_migration_verifier app to perform checks for both the fully migrated and partially migrated databases.
mina-berkeley-migration-verifier \
pre-fork \
--mainnet-archive-uri postgres://postgres:postgres@localhost:5432/source \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/migrated
where:
--mainnet-archive-uri
: connection string to the database to be migrated
--migrated-archive-uri
: connection string to the database that will hold the migrated data
Stage 2: Incremental migration
After the initial migration, the data is migrated data up to the last canonical block. However, Devnet/Mainnet data is progressing with new blocks that must also be migrated again and again until the fork block is announced.
:info: Incremental migration can, and probably must, be repeated a couple of times until the fork block is announced by Mina Foundation. Run the incremental migration multiple times with the latest Devnet/Mainnet database and the latest replayer checkpoint file.
Inputs
- Latest Devnet/Mainnet database
- Devnet/Mainnet genesis ledger
- Replayer checkpoint from last run
- Migrated berkeley database from initial migration
Outputs
- Migrated Devnet/Mainnet database to the Berkeley format up to the last canonical block
- Replayer checkpoint which can be used for the next incremental migration
Phase 1: Berkeley migration app run
mina-berkeley-migration \
--batch-size 1000 \
--config-file ledger.json \
--mainnet-archive-uri postgres://postgres:postgres@localhost:5432/source \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--blocks-bucket mina_network_block_data \
--precomputed-blocks-local-path . \
--keep-precomputed-blocks \
--network NETWORK
where:
--batch-size
: number of precomputed blocks to be fetched at one time from Google Cloud. A larger number, like 1000, can help speed up migration process.
--config-file
: path to the genesis ledger file
--mainnet-archive-uri
: connection string to the database to be migrated
--migrated-archive-uri
: connection string to the database that will hold the migrated data
--blocks-bucket
: name of the precomputed blocks bucket. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
--precomputed-blocks-local-path
: path to folder for on-disk precomputed blocks location
--keep-precomputed-blocks
: keep the precomputed blocks on-disk after the migration is complete
--network
: the network name (devnet
or mainnet
) when determining precomputed blocks. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
Phase 2: Replayer in migration mode run
mina-migration-replayer \
--migration-mode \
--archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--input-file replayer-checkpoint-XXX.json \
--checkpoint-interval 10000 \
--checkpoint-output-folder .
where:
--migration-mode
: flag for migration
--archive-uri
: connection string to the database that will hold the migrated data
--input-file
: path to the latest checkpoint file replayer-checkpoint-XXX.json
replayer-checkpoint-XXX.json
: the latest checkpoint generated from the previous migration
--checkpoint-interval
: frequency of checkpoints file expressed in blocks count
--checkpoint-output-folder
: path to folder for replayer checkpoint files
Incremental migration can be run continuously on top of the initial migration or last incremental until the fork block is announced.
Phase 3: Validations
Use the berkeley_migration_verifier app to perform checks for both the fully migrated and partially migrated database.
mina-berkeley-migration-verifier \
pre-fork \
--mainnet-archive-uri postgres://postgres:postgres@localhost:5432/source \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/migrated
where:
--mainnet-archive-uri
: connection string to the database to be migrated
--migrated-archive-uri
: connection string to the database that will hold the migrated data
Note that: you can run incremental migration continuously on top of the initial migration or the last incremental until the fork block is announced.
Stage 3: Remainder migration
When the fork block is announced, you must tackle the remainder migration. This is the last migration run
you need to perform. In this stage, you close the migration cycle with the last migration of the remainder blocks between the current last canonical block and the fork block (which can be pending, so you don't need to wait 290 blocks until it would become canonical).
You must use --fork-state-hash
as an additional parameter to the berkeley-migration app.
Inputs
- Latest Devnet/Mainnet database
- Devnet/Mainnet genesis ledger
- Replayer checkpoint from last run
- Migrated Berkeley database from last run
- Fork block state hash
Outputs
- Migrated devnet/mainnet database to berkeley up to fork point
- Replayer checkpoint which can be used for the next incremental migration
The migrated database output from this stage of the final migration is required to initialize your archive nodes on the upgraded network.
Phase 1: Berkeley migration app run
mina-berkeley-migration \
--batch-size 1000 \
--config-file ledger.json \
--mainnet-archive-uri postgres://postgres:postgres@localhost:5432/source \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--blocks-bucket mina_network_block_data \
--precomputed-blocks-local-path \
--keep-precomputed-blocks \
--network NETWORK \
--fork-state-hash {fork-state-hash}
where:
--batch-size
: number of precomputed blocks to be fetched at one time from Google Cloud. A larger number, like 1000, can help speed up migration process.
--config-file
: path to the genesis ledger file
--mainnet-archive-uri
: connection string to the database to be migrated
--migrated-archive-uri
: connection string to the database that will hold the migrated data
--blocks-bucket
: name of the precomputed blocks bucket. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
--precomputed-blocks-local-path
: path to folder for on-disk precomputed blocks location
--keep-precomputed-blocks
: keep the precomputed blocks on-disk after the migration is complete
--network
: the network name (devnet
or mainnet
) when determining precomputed blocks. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json
--fork-state-hash
: fork state hash
Info When you run the berkeley-migration app with fork-state-hash, there is no requirement for the fork state block to be canonical. The tool automatically converts all pending blocks in the subchain, including the fork block, to canonical blocks.
Phase 2: Replayer in migration mode run
mina-migration-replayer \
--migration-mode \
--archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--input-file replayer-checkpoint-XXX.json \
--checkpoint-interval 10000 \
--checkpoint-output-folder .
where:
--migration-mode
: flag for migration
--archive-uri
: connection string to the database that will hold the migrated data
--input-file
: path to the latest checkpoint file replayer-checkpoint-XXX.json
from stage 1
replayer-checkpoint-XXX.json
: the latest checkpoint generated from the previous migration
--checkpoint-interval
: frequency of checkpoints file expressed in blocks count
--checkpoint-output-folder
: path to folder for replayer checkpoint files
Phase 3: Validations
Use the berkeley_migration_verifier app to perform checks for both the fully migrated and partially migrated databases.
mina-berkeley-migration-verifier \
post-fork \
--mainnet-archive-uri postgres://postgres:postgres@localhost:5432/source \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--fork-config-file fork_genesis_config.json \
--migrated-replayer-output replayer-checkpoint-XXXX.json
where:
--mainnet-archive-uri
: connection string to the database to be migrated
--migrated-archive-uri
: connection string to the database that will hold the migrated data
--migrated-replayer-output
: path to the latest checkpoint file replayer-checkpoint-XXX.json
--fork-config
: fork genesis config file is the new genesis config that is distributed with the new daemon and is published after the fork block is announced
Example migration steps using Mina Foundation data for Devnet
Download and import the archive dump:
wget -c https://storage.googleapis.com/mina-archive-dumps/devnet-archive-dump-2024-03-27_0000.sql.tar.gz
tar -xf devnet-archive-dump-2024-03-27_0000.sql.tar.gz
psql -U postgres -a -f devnet-archive-dump-2024-03-27_0000.sqlDownload migration software:
CODENAME=bullseye
CHANNEL=unstable
VERSION=2.0.0berkeley-rc1-berkeley-c308efc-bullseye
echo "deb [trusted=yes] http://packages.o1test.net $CODENAME $CHANNEL" | tee /etc/apt/sources.list.d/mina.list
apt-get update
apt-get install --allow-downgrades -y "mina-archive-berkeley-archive-migration=$VERSION"Create an empty database with schema only:
wget https://raw.githubusercontent.com/MinaProtocol/mina/berkeley/src/app/archive/zkapp_tables.sql
wget https://raw.githubusercontent.com/MinaProtocol/mina/berkeley/src/app/archive/create_schema.sql
psql -U postgres -c "CREATE DATABASE berkeley_migrated;"
psql -U postgres -d berkeley_migrated -a -f create_schema.sqlDownload the Devnet genesis ledger
Stage 1: Initial migration
5.a) Phase 1:
mina-berkeley-migration \
--batch-size 2000 \
--config-file /etc/mina/genesis_ledgers/devnet.json \
--mainnet-archive-uri postgres://postgres:postgres@localhost/archive_balances_migrated \
--migrated-archive-uri postgres://postgres:postgres@localhost/berkeley_migrated \
--blocks-bucket mina_network_block_data \
--keep-precomputed-blocks \
--stream-precomputed-blocks \
--network devnet5.b) Phase 2:
# devnet.json is the genesis file from which daemon bootstrap on devnet
jq '.ledger.accounts' devnet.json | jq '{genesis_ledger: {accounts: .}}' > replayer_input_config.json
mina-migration-replayer \
--migration-mode \
--archive-uri postgres://postgres:postgres@localhost:5432/berkeley_migrated \
--input-file replayer_config_input.json \
--checkpoint-interval 100 \
--checkpoint-file-prefix migration5.c) Phase 3:
mina-berkeley-migration-verifier \
pre-fork \
--mainnet-archive-uri postgres://postgres:postgres@localhost/archive_balances_migrated \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/berkeley_migratedStage 2: Initial migration
6.a) Phase 1:
mina-berkeley-migration \
--batch-size 2000 \
--config-file /etc/mina/genesis_ledgers/devnet.json \
--mainnet-archive-uri postgres://postgres:postgres@localhost/archive_balances_migrated \
--migrated-archive-uri postgres://postgres:postgres@localhost/berkeley_migrated \
--blocks-bucket mina_network_block_data \
--network devnet6.b) Phase 2:
mina-migration-replayer \
--migration-mode \
--archive-uri postgres://postgres:postgres@localhost:5432/berkeley_migrated \
--input-file checkpoint-XXXX.json \
--checkpoint-interval 100 \
--checkpoint-file-prefix migration6.c) Phase 3:
mina-berkeley-migration-verifier \
pre-fork \
--mainnet-archive-uri postgres://postgres:postgres@localhost/archive_balances_migrated \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/berkeley_migratedStage 3: Remainder migration
7.a) Phase 1:
mina-berkeley-migration \
--batch-size 2000
--config-file /etc/mina/genesis_ledgers/devnet.json \
--mainnet-archive-uri postgres://postgres:postgres@localhost/archive_balances_migrated \
--migrated-archive-uri postgres://postgres:postgres@localhost/berkeley_migrated \
--blocks-bucket mina_network_block_data \
--network devnet \
--fork-state-hash "3NLdCBNrDseiDKvVj8rZ15k2oAUvx4XuCc8mzf6fL2CmqTJVVceM"⚠️ 3NLdCBNrDseiDKvVj8rZ15k2oAUvx4XuCc8mzf6fL2CmqTJVVceM is only an example random hash. Do not use this example hash on the actual migration.
Use the official hash as provided by Mina Foundation for the fork point.
7.b) Phase 2:
mina-migration-replayer \
--migration-mode \
--archive-uri postgres://postgres:postgres@localhost:5432/berkeley_migrated \
--input-file checkpoint-XXXX.json \
--checkpoint-interval 100 \
--checkpoint-file-prefix migration7.c) Phase 3:
mina-berkeley-migration-verifier \
post-fork \
--mainnet-archive-uri postgres://postgres:postgres@localhost/archive_balances_migrated \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/berkeley_migrated \
--fork-config-file genesis_config_fork.json \
--migrated-replayer-output migrated-checkpoint-XXX.json
How to verify a successful migration
o1Labs and Mina Foundation make every effort to provide reliable tools of high quality. However, it is not possible to eliminate all errors and test all possible Mainnet archive variations.
All important checks are implemented in the mina-berkeley-migration-verifier
application.
However, you can use the following checklist if you want to perform the checks manually:
All transaction (user command and internal command) hashes are left intact.
Verify that the
user_command
andinternal_command
tables have the Devnet/Mainnet format of hashes. For example,CkpZirFuoLVV...
.Parent-child block relationship is preserved
Verify that a given block in the migrated archive has the same parent in the Devnet/Mainnet archive (
state_hash
andparent_hash
columns) that was used as input.Account balances remain the same
Verify the same balance exists for a given block in Mainnet and the migrated databases.
Tips and tricks
We are aware that the migration process can be very long (a couple of days). Therefore, we encourage you to use cron jobs that migrate data incrementally. The cron job requires access to Google Cloud buckets (or other storage):
- A bucket to store migrated-so-far database dumps
- A bucket to store checkpoint files
We are tightly coupled with Google Cloud infrastructure due to the precomputed block upload mechanism. This is why we are using also buckets for storing dumps and checkpoint. However, you do not have to use Google Cloud for other things than precomputed blocks. With configuration, you can use any gsutil-compatible storage backend (for example, S3).
Before running the cron job, upload an initial database dump and an initial checkpoint file.
To create the files, run these steps locally:
- Download a Devnet/Mainnet archive dump and load it into PostgreSQL.
- Create an empty database using the new archive schema.
- Run the berkeley-migration app against the Devnet/Mainnet and new databases.
- Run the replayer app in migration mode with the
--checkpoint-interval
set to a suitable value (perhaps 100) and start with the original Devnet/Mainnet ledger in the input file. - Use pg_dump to dump the migrated database and upload it.
- Upload the most recent checkpoint file.
The cron job performs the same steps in an automated fashion:
- Pulls the latest Devnet/Mainnet archive dump and loads it into PostgresQL.
- Pulls the latest migrated database and loads it into PostgreSQL.
- Pulls the latest checkpoint file.
- Runs the berkeley-migration app against the two databases.
- Runs the replayer app in migration mode using the downloaded checkpoint file; set the checkpoint interval to be smaller (perhaps 50) because there are typically only 200 or so blocks in a day.
- Uploads the migrated database.
- Uploads the most recent checkpoint file.
Be sure to monitor the cron job for errors.
Just before the Berkeley upgrade, migrate the last few blocks by running locally:
- Download the Devnet/Mainnet archive data directly from the k8s PostgreSQL node (not from the archive dump), and load it into PostgreSQL.
- Download the most recent migrated database and load it into PostgresQL.
- Download the most recent checkpoint file.
- Run the berkeley-migration app against the two databases.
- Run the replayer app in migration mode using the most recent checkpoint file.
It is worthwhile to perform these last steps as a dry run to make sure all goes well. You can run these steps as many times as needed.
Known migration problems
Tips to overcome known challenges.
Berkeley migration app is consuming all of my resources
When running a full migration, you can stumble on memory leaks that prevent you from cleanly performing the migration in one pass. A machine with 64 GB of RAM can be frozen after ~40k migrated blocks. Each 200 blocks inserted into the database increases the memory leak by 4-10 MB.
A potential workaround is to split the migration into smaller parts using cron jobs or automation scripts.
Related GitHub issues:
FAQ
Answers to frequently asked questions about migration.
Migrated database is missing orphaned blocks
By design, Berkeley migration omits orphaned blocks and, by default, migrates only canonical (and pending, if setup correctly) blocks.
Replayer in migration mode overrides my old checkpoints
By default, the replayer dumps the checkpoint to the current folder. All checkpoint files have a similar format:
replayer-checkpoint-{number}.json.
To prevent override of old checkpoints, use the --checkpoint-output-folder
and --checkpoint-file-prefix
parameters to modify the output folder and prefix.
Replayer in migration mode exits the process in the middle of the run
Most likely, there are some missing blocks in the Devnet/Mainnet database. Ensure that you patched the Devnet/Mainnet archive before the migration process. See Devnet/Mainnet database maintenance.
How to migrate Devnet/Mainnet pending blocks
In the first phase of migration, use the --end-global-slot
parameter.
In the second phase of migration, add the property target_epoch_ledgers_state_hash
with the expected state_hash
value:
{
"target_epoch_ledgers_state_hash":"{target_state_hash}",
"genesis_ledger": "..."
}