Rules for naming files and folders that are cross platform and helpful in datascience
2022-11-03 Corrections made as pointed by https://fosstodon.org/@rudolf read this thread https://fosstodon.org/@swatantra/109281773145979327
Introduction
This post aims to lay down some rules that help name files and folders that are cross-platform, meaning they are acceptable across Linux, MacOS, and MS Windows, the three major operating systems. I have compiled these rules following international standards and recommendations from the US National Archives. Some rules mentioned here are exceptions to OS compatibility and are mentioned here from a useability point of view.
Having suitable files and folder naming can be helpful in data science, especially while working on the command line or programmatically querying large sets of files. For example, not having spaces in file names makes it easier to refer to them on the command line; having fixed delimiters in the file names can help iterate over files and fetch critical information programmatically.
These rules are equally applicable outside datascience settings - even for your documents, photos, videos, etc.
Rules
Use styles that are internationally recognized, easy to read by a human eye, and require fewer finger movements on the keyboard, for example, using lowercase letters that eliminate the shift key.
- Use all lowercase letters (this doesn’t apply to OS compatibility).
- Use hyphens (not underscores) to separate words. Avoid using spaces as they are difficult to work with on the command line.
- Use YYYY-MM-DD format for dates.
- While using the date at the beginning of a file name, keep the rest of the file name the same as the one created on a previous date (if there is more than one file for the same purpose, but it is sectioned according to the date.)
- Use “.” only for file extensions.
- Use v1, v2, …, vn to denote file versions. Ideally, one should use a version control system (VCS) such as Git.
- For a stack of fewer than 100 files, number them as 00, 01, 02, …, 0n for easier sorting. Accordingly, for a stack of fewer than 1000 files, number them as 000, 001, 002, …, 00n.
- File names should NOT contain punctuation, symbols, or special characters. ” / [ ] : ; | = _ , < ? > & $ # ! ’ { } ( ).
Some Examples
2022-08-31-labnotebook-for-hdstim.docx
figure-01.png
figure-02.png
figure-03.png
/path/to/folder/exploring-flow
References
Citation
@online{farmer2022,
author = {Farmer, Rohit},
title = {Rules for Naming Files and Folders That Are Cross Platform
and Helpful in Datascience},
date = {2022-11-03},
url = {https://dataalltheway.com/posts/009-rules-for-naming-files},
langid = {en}
}