Add post about my Mastodon DB hickup

2020-04-23 14:49:32 +02:00 · 2020-04-23 14:49:32 +02:00 · 4429b381d0
commit 4429b381d0
parent 7d13a05f32
1 changed files with 169 additions and 0 deletions
--- a/content/blog/mastodon-database-hickup.rst
+++ b/content/blog/mastodon-database-hickup.rst
@ -0,0 +1,169 @@
+Mastodon database hickup
+########################
+
+:date: 2020-04-23T11:17Z
+:category: blog
+:tags: postgresql,mastodon
+:url: blog/mastodon-database-hickup/
+:save_as: blog/mastodon/database-hickup/index.html
+:status: published
+:author: Gergely Polonkai
+
+
+After being online for almost a year, my `Mastodon instance <https://social.polonkai.eu>`_ is back
+up.  However, it wasn’t as straightforward as i wanted it to be.
+
+About a year ago my Mastodon server went down.  I messed up my docker images big time and the
+then-new version 2.8.4 didn’t want to start up.  I left it there, and as i had a pretty rough year
+(in the good sense) i didn’t have time to bring it up again.
+
+One of the constant problems i had with this instance is the size of the attachment directory.  It
+easily grew to an enormous size in no time, so i had to constantly clean things up.  Plus, it took
+away the precious space away from my other projects.  So my first move was moving all these files
+to a DigitalOcean Space; with the ``awscli`` Python package it was easy as pie.  If you want to do
+it, too, i followed `this article
+<https://angristan.xyz/2018/05/moving-mastodon-media-files-to-wasabi-object-storage/>`_.
+
+Next came the database.
+
+I tried to load the database dump, created using ``pg_dump -f mastodon.sql mastodon``, with
+``psql -f mastodon.sql``.  Everything went fine, all the records created, except this error
+message:
+
+.. code-block:: text
+
+   psql:mastodon-dump.sql:1691661: ERROR:  could not create unique index "index_tags_on_name"
+   DETAIL:  Key (name)=(opensource) is duplicated.
+
+First i tried to ignore it, but running `rails db:migrate` failed when it wanted to manipulate
+this index.
+
+So i issued this query:
+
+.. code-block:: sql
+
+   SELECT COUNT(*) AS count, name FROM tags GROUP BY name HAVING COUNT(*) > 1;
+
+And the result was this:
+
+.. code-block:: text
+
+   count | name
+       2 | opensource
+       2 | xmpp
+       2 | fdroid
+       2 | socialmedia
+       2 | blogging
+       2 | c64
+       2 | blog
+
+Then collected the IDs of these duplicate tag records using this query:
+
+.. code-block:: sql
+
+   SELECT id, name, created_at
+   FROM tags
+   WHERE name in ('opensource',
+                  'xmpp',
+                  'fdroid',
+                  'socialmedia',
+                  'blogging',
+                  'c64',
+                  'blog')
+   ORDER BY name, created_at;
+
+.. code-block:: text
+
+    id    | name        | created_at
+   -------+-------------+------------
+      159 | blog        | 2018-02-24 19:56:01.710702
+    15941 | blog        | 2019-04-03 22:05:56.438488
+      158 | blogging    | 2018-02-24 19:56:01.721354
+    16006 | blogging    | 2019-04-09 12:13:20.852976
+     5441 | c64         | 2018-07-31 00:50:56.172468
+    16036 | c64         | 2019-04-14 19:56:40.692197
+      924 | fdroid      | 2018-04-14 19:39:21.261817
+    15947 | fdroid      | 2019-04-04 19:10:50.190317
+      237 | opensource  | 2018-03-05 21:50:52.609723
+    15929 | opensource  | 2019-04-03 11:49:21.772961
+      251 | socialmedia | 2018-03-06 23:02:33.573775
+    16034 | socialmedia | 2019-04-14 19:07:37.081635
+      519 | xmpp        | 2018-03-28 16:02:05.023784
+    15988 | xmpp        | 2019-04-07 08:06:59.429965
+
+I wanted to see the ``created_at`` field just to know when things got bad.  As it turns out, it
+was around April 2019, so it might have happened with the 2.8.0 upgrade.  Was it me, or some
+problem with the migrations remains a mistery.
+
+I also wanted to know about all the tables referencing tags:
+
+.. code-block:: sql
+
+   SELECT
+       tc.table_name, kcu.column_name
+   FROM
+       information_schema.table_constraints AS tc
+       JOIN information_schema.key_column_usage AS kcu
+            ON tc.constraint_name = kcu.constraint_name
+       JOIN information_schema.constraint_column_usage AS ccu
+            ON ccu.constraint_name = tc.constraint_name
+   WHERE constraint_type = 'FOREIGN KEY' AND ccu.table_name = 'tags';
+
+.. code-block:: text
+
+       table_name     | column_name
+   -------------------+-------------
+    statuses_tags     | tag_id
+    account_tag_stats | tag_id
+
+And last, but not least, i had to massage the dump a bit.  After creating a backup, i opened up
+the file in ViM and looked for the place where it was loading the data into the ``tags`` table.
+It looked like this:
+
+.. code-block:: sql
+
+   COPY public.tags (name, created_at, updated_at, id) FROM stdin;
+   blogging        2018-02-24 19:56:01.710702      2018-02-24 19:56:01.710702      158
+   blog    2018-02-24 19:56:01.721354      2018-02-24 19:56:01.721354      159
+   opensource      2018-03-05 21:50:52.609723      2018-03-05 21:50:52.609723      237
+   socialmedia     2018-03-06 23:02:33.573775      2018-03-06 23:02:33.573775      251
+   xmpp    2018-03-28 16:02:05.023784      2018-03-28 16:02:05.023784      519
+   fdroid  2018-04-14 19:39:21.261817      2018-04-14 19:39:21.261817      924
+   c64     2018-07-31 00:50:56.172468      2018-07-31 00:50:56.172468      5441
+   opensource      2019-04-03 11:49:21.772961      2019-04-03 11:49:21.772961      15929
+   blog    2019-04-03 22:05:56.438488      2019-04-03 22:05:56.438488      15941
+   fdroid  2019-04-04 19:10:50.190317      2019-04-04 19:10:50.190317      15947
+   xmpp    2019-04-07 08:06:59.429965      2019-04-07 08:06:59.429965      15988
+   blogging        2019-04-09 12:13:20.852976      2019-04-09 12:13:20.852976      16006
+   socialmedia     2019-04-14 19:07:37.081635      2019-04-14 19:07:37.081635      16034
+   c64     2019-04-14 19:56:40.692197      2019-04-14 19:56:40.692197      16036
+
+For every tag, i removed the line with the latest date, and noted their IDs, then went on to the
+``statuses_tags`` table, which looked like this (i don’t put all rows here as it would take a lot
+of space):
+
+.. code-block:: sql
+
+   COPY public.statuses_tags (status_id, tag_id) FROM stdin;
+   101862091116861098      15929
+
+Here, i had to look for the IDs noted before in the last column (ie. at the end of a line), and
+for every line i change the newer ID to the old ones (15929 became 237, and so on).
+
+Finally, the ``account_tag_stats`` table:
+
+.. code-block:: sql
+
+   COPY public.account_tag_stats (id, tag_id, accounts_count, hidden, created_at, updated_at) FROM stdin;
+   1       15929   0       f       2019-04-03 11:49:21.810564      2019-04-03 11:49:21.810564
+
+Here, the ID i was looking for is in the second column, so i used the magical ViM regex
+``^[0-9]\+\t\(15929\|15941\)`` (but with all the other IDs in the parentheses, too), and again
+changed the new IDs to the old ones.
+
+I still can’t believe i didn’t mess up the database too much, but it loaded just fine, and the
+``db:migrate`` command executet just fine.
+
+I also had some problems with so nginx reverse proxying, but it all turned out well, and my
+instance is back online after a year.  Come follow me at `@gergely@polonkai.eu
+<https://social.polonkai.eu/@gergely>`_!