We’re in the process of migrating some data. We have a few tens of thousands of records totaling approximately 1GB. This is a normal data migration process, right?

  1. Download data from the source APIs.
  2. Save the records in a database. SQLite is great for this.
  3. Import them into the upgraded system.
  4. Profit!

But why the database? Perhaps there is a more simple solution.

The records have a nice recursive structure.

  • Conversations
    • Messages
      • Attachments

This hierarchy leans me toward a document oriented storage scheme. However, Why do we always lean on a database for storing our data? Remember the flat-file database? I felt it was time to bring that back.

In Python, there are already modules like FlatPages for Django and Flask, but we’re going for simple.

"""
Maintains the following sturucture:


/database-name
  /channels
    /channel-id
      /messages
       /message-id
         message.json
         /attachments
           /attachment-id
             attachment.json
             asset
"""
class FlatFileDb():

    def __init__(self, base_path):
        self.db = base_path

    def _create(self, p):
        os.makedirs(p, exist_ok=True)
        return p

    def get_channel_path(self, channel_id):
        return self._create(os.path.join(self.db, "channels", channel_id))

    def get_message_path(self, channel_id, message_id):
        return self._create(os.path.join(self.get_channel_path(channel_id), "messages", message_id))

    def get_attachment_path(self, channel_id, message_id, attachment_id):
        return self._create(os.path.join(self.get_message_path(channel_id, message_id), "attachments", attachment_id))

There are a few simple principles here:

  • Directories map collections of objects.
  • The Name of each directory is the key or id of that object.
  • Each directory can have “sub-objects” which themselves are directories.
  • Files are records.

While this system doesn’t require it, you can encode tuple storage into files. Thus a key/value system can be implemented where filenames are the keys and values are the content of the file.

Thus, we now have a system where we can store, and manipulate our data. But can we query it? Absolutely.

How many messages?

-- SQL
SELECT COUNT(*) FROM prod_db.messages

-- Flat File
find db -name message.json | wc -l

Full Text search is easy too. No need to install fancy column types or extensions. Just use grep.

What about data archive?

-- SQL
-- Custom proprietary commands

-- Flat File
tar cf database.tar db

Traditional Databases provide a swatch of elegant features including advanced querying, schema to provide data consistency, and indexes to make queries fast. However, sometimes that’s all overkill. Sometimes, just write some files.