## Data Migration Collection of insights, tips, painful lessons from the data migration --- ### Disclaimer - Hindsight is always 20/20 - These are data points from a single event --- ## Let's start! --- ## Repeat, repeat, repeat - Keep repeating your migrations - Shortens your feedback loop - You want to know ASAP whether your changes are working (sounds familiar?) - Automate - Cronjobs - Scheduled stored procedures --- ### The Checklist Manifesto - Have a checklist of the things to do for the day of the migration - Rehearse - Be careful of fatigue (i.e. skipping steps because you think you know it) - Have a tickbox where you have to really tick it - Automate - Shell scripts --- ## Technical debt - If it lasts months, debt collectors will come knocking - Changes will come - It's worth investing some effort to pay it down or to avoid taking on any --- ## Technical debt (cont.) - What could have been done/done more of - SQL linting - Comments - Technical constraint that should have been fixed much earlier - Tests - The earlier you build these in, the better --- Technical debt & bus factor create a vicious cycle --- ## Data, oh you so nasty - Always validate your assumptions about the data - DB constraints saved me many many times --- ## Getting users to clean up - Give them data in the form that is usable for them to clean up - What's a reasonable default behaviour if the data is not cleaned up? - Development effort vs User effort ```python if trivial: dev() elif user_effort is reasonable: user_cleanup() else: dev() ``` --- ## Sanity checks - Never underestimate this - Saved by this even though we didn't do enough - Ideally, the migration logic and sanity checks should be implemented by different people --- ## Dear migrated data, ## where were you from? - Always have data lineage - Identify where a piece of migrated data was from (i.e. a unique id in the previous system) - Makes patching easier! --- ## Git is our friend - Always check in your - Migration scripts - Scripts that generate user clean up lists - Reference data lists --- "Are you testing migration now? Let me know once you're done so I can test mine" x 1000.. --- ## De-coupling teams - Data coupling - SYS1 -> SYS2 - SYS1 flushes data to test -> SYS2 cannot test - SYS2 needs data -> SYS1 cannot test - De-coupling teams - Two working sets of data --- ## What did I miss out?