170 lines
6.8 KiB
ReStructuredText
170 lines
6.8 KiB
ReStructuredText
Mastodon database hickup
|
||
########################
|
||
|
||
:date: 2020-04-23T11:17Z
|
||
:category: blog
|
||
:tags: postgresql,mastodon
|
||
:url: blog/mastodon-database-hickup/
|
||
:save_as: blog/mastodon-database-hickup/index.html
|
||
:status: published
|
||
:author: Gergely Polonkai
|
||
|
||
|
||
After being online for almost a year, my `Mastodon instance <https://social.polonkai.eu>`_ is back
|
||
up. However, it wasn’t as straightforward as i wanted it to be.
|
||
|
||
About a year ago my Mastodon server went down. I messed up my docker images big time and the
|
||
then-new version 2.8.4 didn’t want to start up. I left it there, and as i had a pretty rough year
|
||
(in the good sense) i didn’t have time to bring it up again.
|
||
|
||
One of the constant problems i had with this instance is the size of the attachment directory. It
|
||
easily grew to an enormous size in no time, so i had to constantly clean things up. Plus, it took
|
||
away the precious space away from my other projects. So my first move was moving all these files
|
||
to a DigitalOcean Space; with the ``awscli`` Python package it was easy as pie. If you want to do
|
||
it, too, i followed `this article
|
||
<https://angristan.xyz/2018/05/moving-mastodon-media-files-to-wasabi-object-storage/>`_.
|
||
|
||
Next came the database.
|
||
|
||
I tried to load the database dump, created using ``pg_dump -f mastodon.sql mastodon``, with
|
||
``psql -f mastodon.sql``. Everything went fine, all the records created, except this error
|
||
message:
|
||
|
||
.. code-block:: text
|
||
|
||
psql:mastodon-dump.sql:1691661: ERROR: could not create unique index "index_tags_on_name"
|
||
DETAIL: Key (name)=(opensource) is duplicated.
|
||
|
||
First i tried to ignore it, but running `rails db:migrate` failed when it wanted to manipulate
|
||
this index.
|
||
|
||
So i issued this query:
|
||
|
||
.. code-block:: sql
|
||
|
||
SELECT COUNT(*) AS count, name FROM tags GROUP BY name HAVING COUNT(*) > 1;
|
||
|
||
And the result was this:
|
||
|
||
.. code-block:: text
|
||
|
||
count | name
|
||
2 | opensource
|
||
2 | xmpp
|
||
2 | fdroid
|
||
2 | socialmedia
|
||
2 | blogging
|
||
2 | c64
|
||
2 | blog
|
||
|
||
Then collected the IDs of these duplicate tag records using this query:
|
||
|
||
.. code-block:: sql
|
||
|
||
SELECT id, name, created_at
|
||
FROM tags
|
||
WHERE name in ('opensource',
|
||
'xmpp',
|
||
'fdroid',
|
||
'socialmedia',
|
||
'blogging',
|
||
'c64',
|
||
'blog')
|
||
ORDER BY name, created_at;
|
||
|
||
.. code-block:: text
|
||
|
||
id | name | created_at
|
||
-------+-------------+------------
|
||
159 | blog | 2018-02-24 19:56:01.710702
|
||
15941 | blog | 2019-04-03 22:05:56.438488
|
||
158 | blogging | 2018-02-24 19:56:01.721354
|
||
16006 | blogging | 2019-04-09 12:13:20.852976
|
||
5441 | c64 | 2018-07-31 00:50:56.172468
|
||
16036 | c64 | 2019-04-14 19:56:40.692197
|
||
924 | fdroid | 2018-04-14 19:39:21.261817
|
||
15947 | fdroid | 2019-04-04 19:10:50.190317
|
||
237 | opensource | 2018-03-05 21:50:52.609723
|
||
15929 | opensource | 2019-04-03 11:49:21.772961
|
||
251 | socialmedia | 2018-03-06 23:02:33.573775
|
||
16034 | socialmedia | 2019-04-14 19:07:37.081635
|
||
519 | xmpp | 2018-03-28 16:02:05.023784
|
||
15988 | xmpp | 2019-04-07 08:06:59.429965
|
||
|
||
I wanted to see the ``created_at`` field just to know when things got bad. As it turns out, it
|
||
was around April 2019, so it might have happened with the 2.8.0 upgrade. Was it me, or some
|
||
problem with the migrations remains a mistery.
|
||
|
||
I also wanted to know about all the tables referencing tags:
|
||
|
||
.. code-block:: sql
|
||
|
||
SELECT
|
||
tc.table_name, kcu.column_name
|
||
FROM
|
||
information_schema.table_constraints AS tc
|
||
JOIN information_schema.key_column_usage AS kcu
|
||
ON tc.constraint_name = kcu.constraint_name
|
||
JOIN information_schema.constraint_column_usage AS ccu
|
||
ON ccu.constraint_name = tc.constraint_name
|
||
WHERE constraint_type = 'FOREIGN KEY' AND ccu.table_name = 'tags';
|
||
|
||
.. code-block:: text
|
||
|
||
table_name | column_name
|
||
-------------------+-------------
|
||
statuses_tags | tag_id
|
||
account_tag_stats | tag_id
|
||
|
||
And last, but not least, i had to massage the dump a bit. After creating a backup, i opened up
|
||
the file in ViM and looked for the place where it was loading the data into the ``tags`` table.
|
||
It looked like this:
|
||
|
||
.. code-block:: sql
|
||
|
||
COPY public.tags (name, created_at, updated_at, id) FROM stdin;
|
||
blogging 2018-02-24 19:56:01.710702 2018-02-24 19:56:01.710702 158
|
||
blog 2018-02-24 19:56:01.721354 2018-02-24 19:56:01.721354 159
|
||
opensource 2018-03-05 21:50:52.609723 2018-03-05 21:50:52.609723 237
|
||
socialmedia 2018-03-06 23:02:33.573775 2018-03-06 23:02:33.573775 251
|
||
xmpp 2018-03-28 16:02:05.023784 2018-03-28 16:02:05.023784 519
|
||
fdroid 2018-04-14 19:39:21.261817 2018-04-14 19:39:21.261817 924
|
||
c64 2018-07-31 00:50:56.172468 2018-07-31 00:50:56.172468 5441
|
||
opensource 2019-04-03 11:49:21.772961 2019-04-03 11:49:21.772961 15929
|
||
blog 2019-04-03 22:05:56.438488 2019-04-03 22:05:56.438488 15941
|
||
fdroid 2019-04-04 19:10:50.190317 2019-04-04 19:10:50.190317 15947
|
||
xmpp 2019-04-07 08:06:59.429965 2019-04-07 08:06:59.429965 15988
|
||
blogging 2019-04-09 12:13:20.852976 2019-04-09 12:13:20.852976 16006
|
||
socialmedia 2019-04-14 19:07:37.081635 2019-04-14 19:07:37.081635 16034
|
||
c64 2019-04-14 19:56:40.692197 2019-04-14 19:56:40.692197 16036
|
||
|
||
For every tag, i removed the line with the latest date, and noted their IDs, then went on to the
|
||
``statuses_tags`` table, which looked like this (i don’t put all rows here as it would take a lot
|
||
of space):
|
||
|
||
.. code-block:: sql
|
||
|
||
COPY public.statuses_tags (status_id, tag_id) FROM stdin;
|
||
101862091116861098 15929
|
||
|
||
Here, i had to look for the IDs noted before in the last column (ie. at the end of a line), and
|
||
for every line i change the newer ID to the old ones (15929 became 237, and so on).
|
||
|
||
Finally, the ``account_tag_stats`` table:
|
||
|
||
.. code-block:: sql
|
||
|
||
COPY public.account_tag_stats (id, tag_id, accounts_count, hidden, created_at, updated_at) FROM stdin;
|
||
1 15929 0 f 2019-04-03 11:49:21.810564 2019-04-03 11:49:21.810564
|
||
|
||
Here, the ID i was looking for is in the second column, so i used the magical ViM regex
|
||
``^[0-9]\+\t\(15929\|15941\)`` (but with all the other IDs in the parentheses, too), and again
|
||
changed the new IDs to the old ones.
|
||
|
||
I still can’t believe i didn’t mess up the database too much, but it loaded just fine, and the
|
||
``db:migrate`` command executet just fine.
|
||
|
||
I also had some problems with so nginx reverse proxying, but it all turned out well, and my
|
||
instance is back online after a year. Come follow me at `@gergely@polonkai.eu
|
||
<https://social.polonkai.eu/@gergely>`_!
|