Database Design – Naming Conventions
Well, that’s a pretty popular line, however, not most Data Architects would agree with Shakespeare on that!
Talking about “Name”, here is an incident that I remember. One day, a new guy in my team created a table and called it “I_WIN_YOU_LOSE”, seriously he did that! And so were the names of the fields in that table. At first sight it was funny, as all the queries running against that table were pretty creative. You can imagine some innovative queries on such a table.
Although all of that earned him the reputation of a “Cool Dude”, it was not possible to follow his style in our Data Warehouse. So, I had to be the geek & explain the team the way we name database objects here…
An observation that I made based on this experience was that when we lay out standards for everyone the progress is rapid. The computer industry has experienced the value of standardization for long. When we standardized on the formatting of a disk drive we got drives that could be read by both Unix and Windows systems, CDs, DVDs, Flash Drives, all have a standard format for information exchange. When we standardized on the TCP/IP protocol we are able to connect all the computers to each other and the internet was born! When we standardized on the HTML protocol the world wide web was born. I could give you many more examples but you get the point. So why can’t we humans follow simple naming conventions? Particularly when it comes to naming table names and column names we suddenly get creative. Its almost like we take for granted an authority to be different.
When it comes to data modeling, especially in a multi-tier, team based, fast growing environment, the “Name” of an object really becomes crucial as it defines the objects. Naming objects becomes much more than just tagging a word to a face. Naming becomes complex when different people have different meanings for the same name and also have different names with the same meaning. Everyone has their own style that comes with the personality, personal preference as well as past experiences, like the one in my earlier story.
Let’s try to understand the principles behind naming conventions with specific industry standards & examples. I am going to make this more generic & not specifically to our environment.
Principles of Naming Conventions:
By combining the words of names in a specific way, standardized data component names are created. The rules will vary for each organization, but the basic principles for developing rule sets are constant.
There are three kinds of rules that form a complete naming convention:
- • Semantic rules are based on the meaning of the words used to describe
- • Syntax rules prescribe the arrangement of words within a name
- • Lexical rules concern the language-related aspects of names
I. Semantic Rules:
These are rules based on the meaning of the words used to name data components.
- • Subjects: entity or subject terms are based on the names of data objects or subjects that are found in data models (entities) or object models (object classes).
- • Modifiers: can be that subject’s properties or qualifiers that are used interchangeably when naming data objects.
- • Class Words: describe the type of data that a column or attribute contains. This is a classification of the type of data, or domain.
II. Syntax Rules:
These rules specify the arrangement of name components. Examples of Syntax Rules are:
- • The subject or object term occupies the leftmost position in the name, unless it is used as a modifier to another subject.
- • Modifier terms follow the subject. The order of the qualifiers in a name is used to make the name complete and clear to the intended audience. Use subject, property and/or qualifier terms as needed.
- • For columns and attributes, the last term should be the class word at the rightmost position.
III. Lexical Rules
These rules determine the standard look of names.
Examples of Lexical Rules are:
- • Nouns are used in singular form
- • Verbs are always in the present tense
- • No special characters are allowed
- • All words are separated by underscores
- • All words are in upper case
- • Listed / approved abbreviations and acronyms
I. Definitions & Common Rules
Entity or Table
An entity is the representation of a distinguishable person, place, thing, concept, event or state that has characteristics, properties and relationships. A table is a physical collection of data about a person, place, thing, concept, event or state. A table may correspond with an entity.
Attribute or Column
A column or attribute contains a specific detail about an entity or table. A column or attribute should not contain multiple values such as arrays or concatenated values.
- • The name of an entity / table or attribute / column should enable its audience to identify and locate it within its context. Therefore, each entity / table or attribute / column name must be unique within its context (an entity within its model, a table within a database schema, an attribute within an entity or a column within a table).
- • The name of an entity / table or attribute / column should be a declaration of the classification of the data it contains or will contain and therefore it should be a noun or noun phrase in the singular form and should follow a classification declarative format.
- • The name of an entity / table or attribute / column should enable designers, developers and business personnel to effectively know what is in it or what to place in it. It should describe its content (what it is), rather than how it is used, processed, populated or decoded.
II. Formats & Examples of Naming Conventions
Entity or table name should be a noun phrase constructed with the following format:
Attribute or column names should be a noun phrase constructed following the format:
Subject Modifier Class
Subject indicates the class of information that the entity or table describes; it provides the proper naming context for the modifier. Subjects are nouns that name things.
Subjects may be composed of several terms or words.
Examples: Employee, Purchase Order, Item etc.
Modifier is an optional component of the entity or table name that further qualifies the name. The modifier is one or more properties and one or more qualifiers.
Examples: Project Installment, Employee Contact etc.
Class or class word classifies the type of information being represented by the column or attribute.
Examples: Employee Number, GL Account Number, Purchase Order Status Code
All of this might sound boring to all the Shakespeare types out there, but I know for sure that all the Data Architects would agree & appreciate these naming conventions!
After all this makes our lives easier & organized and data interchangeable. I will follow this up with a post on data types and data quality rules that change data into information. How to populate missing values? How to check for ranges of values for each field? How to test data quality on the dimensions of completeness, accuracy, consistency, timeliness, etc.