The SSH host key has changed on 8 April, 2022 to this one: SHA256:573uTBSeh74kvOo0HJXi5ijdzRm8me27suzNEDlGyrQ
Sources of the site gergely.polonkai.eu, Jekyll version
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
gergelypolonkai-web-jekyll/content/blog/mastodon-database-hickup.rst

169 lines
6.8 KiB

Mastodon database hickup
########################
:date: 2020-04-23T11:17Z
:category: blog
:tags: postgresql,mastodon
:url: blog/mastodon-database-hickup/
:save_as: blog/mastodon-database-hickup/index.html
:status: published
:author: Gergely Polonkai
After being online for almost a year, my `Mastodon instance <https://social.polonkai.eu>`_ is back
up. However, it wasn’t as straightforward as i wanted it to be.
About a year ago my Mastodon server went down. I messed up my docker images big time and the
then-new version 2.8.4 didn’t want to start up. I left it there, and as i had a pretty rough year
(in the good sense) i didn’t have time to bring it up again.
One of the constant problems i had with this instance is the size of the attachment directory. It
easily grew to an enormous size in no time, so i had to constantly clean things up. Plus, it took
away the precious space away from my other projects. So my first move was moving all these files
to a DigitalOcean Space; with the ``awscli`` Python package it was easy as pie. If you want to do
it, too, i followed `this article
<https://angristan.xyz/2018/05/moving-mastodon-media-files-to-wasabi-object-storage/>`_.
Next came the database.
I tried to load the database dump, created using ``pg_dump -f mastodon.sql mastodon``, with
``psql -f mastodon.sql``. Everything went fine, all the records created, except this error
message:
.. code-block:: text
psql:mastodon-dump.sql:1691661: ERROR: could not create unique index "index_tags_on_name"
DETAIL: Key (name)=(opensource) is duplicated.
First i tried to ignore it, but running `rails db:migrate` failed when it wanted to manipulate
this index.
So i issued this query:
.. code-block:: sql
SELECT COUNT(*) AS count, name FROM tags GROUP BY name HAVING COUNT(*) > 1;
And the result was this:
.. code-block:: text
count | name
2 | opensource
2 | xmpp
2 | fdroid
2 | socialmedia
2 | blogging
2 | c64
2 | blog
Then collected the IDs of these duplicate tag records using this query:
.. code-block:: sql
SELECT id, name, created_at
FROM tags
WHERE name in ('opensource',
'xmpp',
'fdroid',
'socialmedia',
'blogging',
'c64',
'blog')
ORDER BY name, created_at;
.. code-block:: text
id | name | created_at
-------+-------------+------------
159 | blog | 2018-02-24 19:56:01.710702
15941 | blog | 2019-04-03 22:05:56.438488
158 | blogging | 2018-02-24 19:56:01.721354
16006 | blogging | 2019-04-09 12:13:20.852976
5441 | c64 | 2018-07-31 00:50:56.172468
16036 | c64 | 2019-04-14 19:56:40.692197
924 | fdroid | 2018-04-14 19:39:21.261817
15947 | fdroid | 2019-04-04 19:10:50.190317
237 | opensource | 2018-03-05 21:50:52.609723
15929 | opensource | 2019-04-03 11:49:21.772961
251 | socialmedia | 2018-03-06 23:02:33.573775
16034 | socialmedia | 2019-04-14 19:07:37.081635
519 | xmpp | 2018-03-28 16:02:05.023784
15988 | xmpp | 2019-04-07 08:06:59.429965
I wanted to see the ``created_at`` field just to know when things got bad. As it turns out, it
was around April 2019, so it might have happened with the 2.8.0 upgrade. Was it me, or some
problem with the migrations remains a mistery.
I also wanted to know about all the tables referencing tags:
.. code-block:: sql
SELECT
tc.table_name, kcu.column_name
FROM
information_schema.table_constraints AS tc
JOIN information_schema.key_column_usage AS kcu
ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage AS ccu
ON ccu.constraint_name = tc.constraint_name
WHERE constraint_type = 'FOREIGN KEY' AND ccu.table_name = 'tags';
.. code-block:: text
table_name | column_name
-------------------+-------------
statuses_tags | tag_id
account_tag_stats | tag_id
And last, but not least, i had to massage the dump a bit. After creating a backup, i opened up
the file in ViM and looked for the place where it was loading the data into the ``tags`` table.
It looked like this:
.. code-block:: sql
COPY public.tags (name, created_at, updated_at, id) FROM stdin;
blogging 2018-02-24 19:56:01.710702 2018-02-24 19:56:01.710702 158
blog 2018-02-24 19:56:01.721354 2018-02-24 19:56:01.721354 159
opensource 2018-03-05 21:50:52.609723 2018-03-05 21:50:52.609723 237
socialmedia 2018-03-06 23:02:33.573775 2018-03-06 23:02:33.573775 251
xmpp 2018-03-28 16:02:05.023784 2018-03-28 16:02:05.023784 519
fdroid 2018-04-14 19:39:21.261817 2018-04-14 19:39:21.261817 924
c64 2018-07-31 00:50:56.172468 2018-07-31 00:50:56.172468 5441
opensource 2019-04-03 11:49:21.772961 2019-04-03 11:49:21.772961 15929
blog 2019-04-03 22:05:56.438488 2019-04-03 22:05:56.438488 15941
fdroid 2019-04-04 19:10:50.190317 2019-04-04 19:10:50.190317 15947
xmpp 2019-04-07 08:06:59.429965 2019-04-07 08:06:59.429965 15988
blogging 2019-04-09 12:13:20.852976 2019-04-09 12:13:20.852976 16006
socialmedia 2019-04-14 19:07:37.081635 2019-04-14 19:07:37.081635 16034
c64 2019-04-14 19:56:40.692197 2019-04-14 19:56:40.692197 16036
For every tag, i removed the line with the latest date, and noted their IDs, then went on to the
``statuses_tags`` table, which looked like this (i don’t put all rows here as it would take a lot
of space):
.. code-block:: sql
COPY public.statuses_tags (status_id, tag_id) FROM stdin;
101862091116861098 15929
Here, i had to look for the IDs noted before in the last column (ie. at the end of a line), and
for every line i change the newer ID to the old ones (15929 became 237, and so on).
Finally, the ``account_tag_stats`` table:
.. code-block:: sql
COPY public.account_tag_stats (id, tag_id, accounts_count, hidden, created_at, updated_at) FROM stdin;
1 15929 0 f 2019-04-03 11:49:21.810564 2019-04-03 11:49:21.810564
Here, the ID i was looking for is in the second column, so i used the magical ViM regex
``^[0-9]\+\t\(15929\|15941\)`` (but with all the other IDs in the parentheses, too), and again
changed the new IDs to the old ones.
I still can’t believe i didn’t mess up the database too much, but it loaded just fine, and the
``db:migrate`` command executet just fine.
I also had some problems with so nginx reverse proxying, but it all turned out well, and my
instance is back online after a year. Come follow me at `@gergely@polonkai.eu
<https://social.polonkai.eu/@gergely>`_!