From 4429b381d05eae2c8346fd1ba646d8b2ac7d5577 Mon Sep 17 00:00:00 2001 From: Gergely Polonkai Date: Thu, 23 Apr 2020 14:49:32 +0200 Subject: [PATCH] Add post about my Mastodon DB hickup --- content/blog/mastodon-database-hickup.rst | 169 ++++++++++++++++++++++ 1 file changed, 169 insertions(+) create mode 100644 content/blog/mastodon-database-hickup.rst diff --git a/content/blog/mastodon-database-hickup.rst b/content/blog/mastodon-database-hickup.rst new file mode 100644 index 0000000..609bf71 --- /dev/null +++ b/content/blog/mastodon-database-hickup.rst @@ -0,0 +1,169 @@ +Mastodon database hickup +######################## + +:date: 2020-04-23T11:17Z +:category: blog +:tags: postgresql,mastodon +:url: blog/mastodon-database-hickup/ +:save_as: blog/mastodon/database-hickup/index.html +:status: published +:author: Gergely Polonkai + + +After being online for almost a year, my `Mastodon instance `_ is back +up. However, it wasn’t as straightforward as i wanted it to be. + +About a year ago my Mastodon server went down. I messed up my docker images big time and the +then-new version 2.8.4 didn’t want to start up. I left it there, and as i had a pretty rough year +(in the good sense) i didn’t have time to bring it up again. + +One of the constant problems i had with this instance is the size of the attachment directory. It +easily grew to an enormous size in no time, so i had to constantly clean things up. Plus, it took +away the precious space away from my other projects. So my first move was moving all these files +to a DigitalOcean Space; with the ``awscli`` Python package it was easy as pie. If you want to do +it, too, i followed `this article +`_. + +Next came the database. + +I tried to load the database dump, created using ``pg_dump -f mastodon.sql mastodon``, with +``psql -f mastodon.sql``. Everything went fine, all the records created, except this error +message: + +.. code-block:: text + + psql:mastodon-dump.sql:1691661: ERROR: could not create unique index "index_tags_on_name" + DETAIL: Key (name)=(opensource) is duplicated. + +First i tried to ignore it, but running `rails db:migrate` failed when it wanted to manipulate +this index. + +So i issued this query: + +.. code-block:: sql + + SELECT COUNT(*) AS count, name FROM tags GROUP BY name HAVING COUNT(*) > 1; + +And the result was this: + +.. code-block:: text + + count | name + 2 | opensource + 2 | xmpp + 2 | fdroid + 2 | socialmedia + 2 | blogging + 2 | c64 + 2 | blog + +Then collected the IDs of these duplicate tag records using this query: + +.. code-block:: sql + + SELECT id, name, created_at + FROM tags + WHERE name in ('opensource', + 'xmpp', + 'fdroid', + 'socialmedia', + 'blogging', + 'c64', + 'blog') + ORDER BY name, created_at; + +.. code-block:: text + + id | name | created_at + -------+-------------+------------ + 159 | blog | 2018-02-24 19:56:01.710702 + 15941 | blog | 2019-04-03 22:05:56.438488 + 158 | blogging | 2018-02-24 19:56:01.721354 + 16006 | blogging | 2019-04-09 12:13:20.852976 + 5441 | c64 | 2018-07-31 00:50:56.172468 + 16036 | c64 | 2019-04-14 19:56:40.692197 + 924 | fdroid | 2018-04-14 19:39:21.261817 + 15947 | fdroid | 2019-04-04 19:10:50.190317 + 237 | opensource | 2018-03-05 21:50:52.609723 + 15929 | opensource | 2019-04-03 11:49:21.772961 + 251 | socialmedia | 2018-03-06 23:02:33.573775 + 16034 | socialmedia | 2019-04-14 19:07:37.081635 + 519 | xmpp | 2018-03-28 16:02:05.023784 + 15988 | xmpp | 2019-04-07 08:06:59.429965 + +I wanted to see the ``created_at`` field just to know when things got bad. As it turns out, it +was around April 2019, so it might have happened with the 2.8.0 upgrade. Was it me, or some +problem with the migrations remains a mistery. + +I also wanted to know about all the tables referencing tags: + +.. code-block:: sql + + SELECT + tc.table_name, kcu.column_name + FROM + information_schema.table_constraints AS tc + JOIN information_schema.key_column_usage AS kcu + ON tc.constraint_name = kcu.constraint_name + JOIN information_schema.constraint_column_usage AS ccu + ON ccu.constraint_name = tc.constraint_name + WHERE constraint_type = 'FOREIGN KEY' AND ccu.table_name = 'tags'; + +.. code-block:: text + + table_name | column_name + -------------------+------------- + statuses_tags | tag_id + account_tag_stats | tag_id + +And last, but not least, i had to massage the dump a bit. After creating a backup, i opened up +the file in ViM and looked for the place where it was loading the data into the ``tags`` table. +It looked like this: + +.. code-block:: sql + + COPY public.tags (name, created_at, updated_at, id) FROM stdin; + blogging 2018-02-24 19:56:01.710702 2018-02-24 19:56:01.710702 158 + blog 2018-02-24 19:56:01.721354 2018-02-24 19:56:01.721354 159 + opensource 2018-03-05 21:50:52.609723 2018-03-05 21:50:52.609723 237 + socialmedia 2018-03-06 23:02:33.573775 2018-03-06 23:02:33.573775 251 + xmpp 2018-03-28 16:02:05.023784 2018-03-28 16:02:05.023784 519 + fdroid 2018-04-14 19:39:21.261817 2018-04-14 19:39:21.261817 924 + c64 2018-07-31 00:50:56.172468 2018-07-31 00:50:56.172468 5441 + opensource 2019-04-03 11:49:21.772961 2019-04-03 11:49:21.772961 15929 + blog 2019-04-03 22:05:56.438488 2019-04-03 22:05:56.438488 15941 + fdroid 2019-04-04 19:10:50.190317 2019-04-04 19:10:50.190317 15947 + xmpp 2019-04-07 08:06:59.429965 2019-04-07 08:06:59.429965 15988 + blogging 2019-04-09 12:13:20.852976 2019-04-09 12:13:20.852976 16006 + socialmedia 2019-04-14 19:07:37.081635 2019-04-14 19:07:37.081635 16034 + c64 2019-04-14 19:56:40.692197 2019-04-14 19:56:40.692197 16036 + +For every tag, i removed the line with the latest date, and noted their IDs, then went on to the +``statuses_tags`` table, which looked like this (i don’t put all rows here as it would take a lot +of space): + +.. code-block:: sql + + COPY public.statuses_tags (status_id, tag_id) FROM stdin; + 101862091116861098 15929 + +Here, i had to look for the IDs noted before in the last column (ie. at the end of a line), and +for every line i change the newer ID to the old ones (15929 became 237, and so on). + +Finally, the ``account_tag_stats`` table: + +.. code-block:: sql + + COPY public.account_tag_stats (id, tag_id, accounts_count, hidden, created_at, updated_at) FROM stdin; + 1 15929 0 f 2019-04-03 11:49:21.810564 2019-04-03 11:49:21.810564 + +Here, the ID i was looking for is in the second column, so i used the magical ViM regex +``^[0-9]\+\t\(15929\|15941\)`` (but with all the other IDs in the parentheses, too), and again +changed the new IDs to the old ones. + +I still can’t believe i didn’t mess up the database too much, but it loaded just fine, and the +``db:migrate`` command executet just fine. + +I also had some problems with so nginx reverse proxying, but it all turned out well, and my +instance is back online after a year. Come follow me at `@gergely@polonkai.eu +`_!