2024-12-25

My personal tips about database design

⚠ Some of the following tips are very divisive and often rise very polarized discussions among DEVs and with DBAs. If so, let’s talk. These are just my opinions end lessons learnt, not stone-written laws.

I’m going to write about the guidelines I often follow for designing OLTP databases. But “context is king”, then some points here must be adapted case by case. A single, catch-all, solution does not applies everywhere and every time. It must be adapted. Nevertheless some approaches work better than other, at least in my personal experience.

In my experience, data in the databases is often the real value of the business and are foundation of it, so treating the databases accordingly very important. Do not consider it as a data swamp or a mud storage. That said, let database do its own work but no more than that. It should store and retrieve data efficiently, guaranteeing integrity and respecting relationships. Higher business logic should be confined in the application, not placed into the database. So, if not strictly required for performance and complexity reasons, do no use stored-procedures and views. Only use tables, relationships and indexes. This approach greatly help the application development and deployment, creating a clear borderline between areas of responsibilities.

Some of the following tips refers to Microsoft SQL Server but can be easily adapted to different RDBMS.

KEEP IN MIND: THESE ARE NOT STRICT RULES BUT TIPS


`#`	TABLES
T.1	Primary Key	always create a primary key on every table
T.2	Surrogate Keys	use surrogate for primary keys whenever possible, e.g. CustomerID (int/long, auto-increment). Consider natural keys only when they are guaranteed immutable or come from proven standards (eg. iso3166).
T.3	Table names	Pascal case (e.g. Order, OrderRow,etc. ). In the battle ‘singular VS plural’, I prefer singular. So [Order] instead of [Orders]. Avoid spaces.
T.4	Field name	Pascal case. E.g. FirstName, LastName, etc.
T.5	Field name for PK and FK	Use “ID” as postfix. For PK field, table name + ID. E.g. OrderID. Same name on both sides of relationship. E.g. Order.OrderID <= OrderRow.OrderID
T.6	Normalization	Normalize but do not over-normalize. Consider real data access patterns and application’s needs.
T.7	Naming	Define naming rules and be consistent across the database.
T.8	Relationships	always define and enforce relationships: foreign keys - primary keys.
T.9	Schemas	do not use schema if not strictly required. Details below
T.10	PK int/bigint	consider if it is better to use BIGINT instead of INT, in particular for fast growing tables.
T.11	Big fields	(n)varchar(max) varbinary(max): use when really required and, if possible, split the table in two tables: the first one with “regular” fields, the second one with “big” fields. On the second table add the same PK, defined as FK point to the first table, to create 1-to-1 relationship.
T.12	FK naming	Suggestion: FK_ParentTable_ChildTable

`#`	INDEXES
I.1	Clustered index	99% of times primary key index is also clustered, but not always. Consider it carefully.
I.2	Tradeoffs	Be very careful when adding an index. An index is always a cost for the database. It makes insert/update/delete of a record, slower. Find a good tradeoff between fast lookup and cost of the index.
I.3	Foreign keys	Foreign keys fields must be indexed.
I.4	Fields to be indexed	In general, consider to index fields heavily used in WHERE conditions. But remember point I.2
I.5	Statistics	Indexes are important but they are effective only with updated statistics.
I.6	Naming	Define naming rules and be consistent across the database. My suggestion: IX_TableName_FieldName(s)

`#`	MISC
M.1	Stored procedures	Avoid if not strictly required.
M.2	Views	Avoid if not strictly required.
M.3	Triggers	Avoid if not strictly required.

`#`	SECURITY
S.1	Authentication	if possible, use Integrated/Windows authentication for human-users and application-users
S.2	Permissions	give the user the minimum permissions required. 99% of the times, db_datareader (SELECT) + db_datawriter (INSERT, UPADTE, DELETE) is enough and the right choice. If the application only needs to read data, just give db_datareader.

`#`	DATABASE GENERAL
G.1	Environments	Add a suffix for identifying the environment of the database. E.g. AdventureWorks-test, AdventureWorks-cert, AdventureWorks-prod.
G.2	Simple / Full mode	Discuss the topic with your DBA, carefully considering realistic business requirements. Full requires more maintenance than Simple.
G.3	Read Committed Snapshot	Read Committed Snasphot option ON
G.4	Index reorganize / rebuild	Ask your DBA for an automatic task for reorganizing indexes. And for rebuilding them, if required. Be careful about performance impacts.
G.5	Statistics update	Ask your DBA for an automatic task for updating them. It is often more important updating statistics than reorganizing indexes. Note: automatic statistics recompute is not always effective and you need a task.
G.6	Backup	ask your DBA. And pretend regular restore tests. It is completely pointless making backup resulting non-working/corrupted when a restore is needed.

`#`	DESIGN/PROJECT
D.1	Project/Tracking	Always track database structure evolution over time. Details below
D.2	Design responsibility	In general, software engineers are responsible of database schema design. But with the assistance of the DBAs for reviewing the tables/relationships/indexes structure in particular if software engineers’ skills on databases are shallow.
D.3	System responsibility	DBAs are always in charge of database’s files organization, disks configuration, system administration and maintenance tasks: backups, index reorganize, statistics update, underlying OS operations, sys monitoring, etc.

T.9

In general ORMs does not play well with them. Instead, use pseudo-schema name as prefix in table name. E.g. Order, OrderRow, OrderPayment, OrderShipment, etc. Or AA_User, AA_Group, AA_Permission, etc. And, if you really need schema, never use the same names under different schema. So avoid things like Orders.Detail Billing.Detail.

D.1

Use Visual Studio 2022 Database project to store and track evolution of database on GIT.
If you don’t like or don’t feel confident in writing tsql directly in VS2022 for designing your database, do it in Sql Server Management Studio SSMS using a LocalDB database. Then, use a SCMP between the LocalDB database and the VS2022 project, to transfer modification to the VS database project. “SSMS + VS2022 + SCMP file”

For software developers

A note for developers: use ORMs, like Entity Framework, whenever possible but not always. Don’t be stubborn. Do not force yourself using them even when performance are not the desired one or if the complexity of the query does not fix well in the ORM. In the same app, you can use ORMs and plain SQL.