<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:series="http://unfoldingneurons.com/"
		>
<channel>
	<title>Comments on: Designing Databases: Picking The Right Data Types</title>
	<atom:link href="http://www.brandonsavage.net/designing-databases-picking-the-right-data-types/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.brandonsavage.net/designing-databases-picking-the-right-data-types/</link>
	<description>The personal blog of Brandon Savage. Contains entries of a personal and professional nature focusing on PHP, Apple, LAMP, MySQL and Washington, DC.</description>
	<lastBuildDate>Fri, 03 Feb 2012 19:36:33 -0500</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Purencool</title>
		<link>http://www.brandonsavage.net/designing-databases-picking-the-right-data-types/#comment-2262</link>
		<dc:creator>Purencool</dc:creator>
		<pubDate>Tue, 24 Nov 2009 01:33:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.brandonsavage.net/?p=1032#comment-2262</guid>
		<description>Thanks for the great article. I have always wondered about the topic you discussed and what happens in large data sets</description>
		<content:encoded><![CDATA[<p>Thanks for the great article. I have always wondered about the topic you discussed and what happens in large data sets</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brice Burgess</title>
		<link>http://www.brandonsavage.net/designing-databases-picking-the-right-data-types/#comment-2261</link>
		<dc:creator>Brice Burgess</dc:creator>
		<pubDate>Tue, 24 Nov 2009 00:36:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.brandonsavage.net/?p=1032#comment-2261</guid>
		<description>Fine introduction -- although I side on an enum happy approach as it does a good job in intrinsically documenting the schema -- but I would LOVE to hear more about approaches to handling DATES. Specifically dates pre 1970.

Keep up the good posts :)</description>
		<content:encoded><![CDATA[<p>Fine introduction &#8212; although I side on an enum happy approach as it does a good job in intrinsically documenting the schema &#8212; but I would LOVE to hear more about approaches to handling DATES. Specifically dates pre 1970.</p>
<p>Keep up the good posts :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hodicska Gergely</title>
		<link>http://www.brandonsavage.net/designing-databases-picking-the-right-data-types/#comment-2202</link>
		<dc:creator>Hodicska Gergely</dc:creator>
		<pubDate>Sat, 21 Nov 2009 23:36:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.brandonsavage.net/?p=1032#comment-2202</guid>
		<description>&quot;if you allow for usernames up to 12 characters, your VARCHAR should not be 255 characters wide.&quot;
This is not true: varchar(12) and varchar(255) is the same as the length is stored on one byte in both cases. The change is here when you pass 255. It is always a good practice to use a bigger (under 255) value with varchar if you are not really sure that you wont need more to avoid unnecessary schema change.

&quot;if you have 3 million records, could take you a week or two (hyperbole).&quot;
This is very-very hyperbolist ;), it takes usually few minutes, of course this depends on the given setup.

&quot;It’s poor schema design, because you’re expecting the database to do what PHP should have done for you.&quot;
I disagree with you: you should defense you data model even on database level, while it is possible that not your code is the only one which interacts with the database. Of course you should consider other factors too like the cost of a possible schema modification, but for example with an MMM setup you can do this even with a running system without an outge.

&quot;See slide 13 of the Drunken Query Master talk where he talks about having to rebuild the table&quot;
I am a big fun of Jay Pipe, but you should believe the manual too. ;)</description>
		<content:encoded><![CDATA[<p>&#8220;if you allow for usernames up to 12 characters, your VARCHAR should not be 255 characters wide.&#8221;<br />
This is not true: varchar(12) and varchar(255) is the same as the length is stored on one byte in both cases. The change is here when you pass 255. It is always a good practice to use a bigger (under 255) value with varchar if you are not really sure that you wont need more to avoid unnecessary schema change.</p>
<p>&#8220;if you have 3 million records, could take you a week or two (hyperbole).&#8221;<br />
This is very-very hyperbolist ;), it takes usually few minutes, of course this depends on the given setup.</p>
<p>&#8220;It’s poor schema design, because you’re expecting the database to do what PHP should have done for you.&#8221;<br />
I disagree with you: you should defense you data model even on database level, while it is possible that not your code is the only one which interacts with the database. Of course you should consider other factors too like the cost of a possible schema modification, but for example with an MMM setup you can do this even with a running system without an outge.</p>
<p>&#8220;See slide 13 of the Drunken Query Master talk where he talks about having to rebuild the table&#8221;<br />
I am a big fun of Jay Pipe, but you should believe the manual too. ;)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Herman Radtke</title>
		<link>http://www.brandonsavage.net/designing-databases-picking-the-right-data-types/#comment-2180</link>
		<dc:creator>Herman Radtke</dc:creator>
		<pubDate>Sat, 21 Nov 2009 05:30:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.brandonsavage.net/?p=1032#comment-2180</guid>
		<description>No, one of the most crucial, and most overlooked, components of database development, is the use of natural keys.  The use of auto_increment and uuid/guid should be minimal.

The information on TEXT data types is just plain wrong.  Look at http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html for reference.  The only real difference between VARCHAR and TEXT is that VARCHAR has a length constraint.

Hard drives are cheap.  RAID, proper indexes and caching are proper solutions to I/O bottlenecks.

For a discussion on what really matters in SQL design, I would suggest checking out Joe Celko&#039;s book on SQL Style: http://www.amazon.com/exec/obidos/ASIN/0120887975</description>
		<content:encoded><![CDATA[<p>No, one of the most crucial, and most overlooked, components of database development, is the use of natural keys.  The use of auto_increment and uuid/guid should be minimal.</p>
<p>The information on TEXT data types is just plain wrong.  Look at <a href="http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html" rel="nofollow">http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html</a> for reference.  The only real difference between VARCHAR and TEXT is that VARCHAR has a length constraint.</p>
<p>Hard drives are cheap.  RAID, proper indexes and caching are proper solutions to I/O bottlenecks.</p>
<p>For a discussion on what really matters in SQL design, I would suggest checking out Joe Celko&#8217;s book on SQL Style: <a href="http://www.amazon.com/exec/obidos/ASIN/0120887975" rel="nofollow">http://www.amazon.com/exec/obidos/ASIN/0120887975</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jesper Wisborg Krogh</title>
		<link>http://www.brandonsavage.net/designing-databases-picking-the-right-data-types/#comment-2170</link>
		<dc:creator>Jesper Wisborg Krogh</dc:creator>
		<pubDate>Sat, 21 Nov 2009 01:28:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.brandonsavage.net/?p=1032#comment-2170</guid>
		<description>One thing also to be aware of is that how data types are handled varies between storage engines. InnoDB effective stores both CHAR and VARCHAR the same way. Also InnoDB at least in MySQL 5.0 uses four bytes to store a medium int (and I think small int as well) in which case I believe it is better to choose an int.

Regarding enum and sets, then I prefer not to use them and rather use a reference table (and for sets a join table) as it is my impression those data types are MySQL specific.</description>
		<content:encoded><![CDATA[<p>One thing also to be aware of is that how data types are handled varies between storage engines. InnoDB effective stores both CHAR and VARCHAR the same way. Also InnoDB at least in MySQL 5.0 uses four bytes to store a medium int (and I think small int as well) in which case I believe it is better to choose an int.</p>
<p>Regarding enum and sets, then I prefer not to use them and rather use a reference table (and for sets a join table) as it is my impression those data types are MySQL specific.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dave Rowe</title>
		<link>http://www.brandonsavage.net/designing-databases-picking-the-right-data-types/#comment-2167</link>
		<dc:creator>Dave Rowe</dc:creator>
		<pubDate>Fri, 20 Nov 2009 21:15:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.brandonsavage.net/?p=1032#comment-2167</guid>
		<description>Interesting points.  Though, in most cases, schema changes aren&#039;t necessarily organic, thus a design may need to handle the &#039;what ifs&#039; and potential growth issues.  Deciding on SMALLINT versus BIGINT because of a few bytes isn&#039;t really a concern, because the difference is only really seen in _large_ databases, at which point, this point is moot, since you&#039;ll need BIGINTs to handle those millions of rows.

Design ideas themselves, should also be organic.  As was demonstrated with your recommendation of avoiding ENUMs because of table rebuilds.  Advancements in the technology allow for changes without the re-builds as referenced by FractalizeR, so the resulting point should be, make the best decision given what the underlying technology supports.  But, be willing to be flexible on old rules.

Good post though!  Keep &#039;em coming.</description>
		<content:encoded><![CDATA[<p>Interesting points.  Though, in most cases, schema changes aren&#8217;t necessarily organic, thus a design may need to handle the &#8216;what ifs&#8217; and potential growth issues.  Deciding on SMALLINT versus BIGINT because of a few bytes isn&#8217;t really a concern, because the difference is only really seen in _large_ databases, at which point, this point is moot, since you&#8217;ll need BIGINTs to handle those millions of rows.</p>
<p>Design ideas themselves, should also be organic.  As was demonstrated with your recommendation of avoiding ENUMs because of table rebuilds.  Advancements in the technology allow for changes without the re-builds as referenced by FractalizeR, so the resulting point should be, make the best decision given what the underlying technology supports.  But, be willing to be flexible on old rules.</p>
<p>Good post though!  Keep &#8216;em coming.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Crumm</title>
		<link>http://www.brandonsavage.net/designing-databases-picking-the-right-data-types/#comment-2166</link>
		<dc:creator>Michael Crumm</dc:creator>
		<pubDate>Fri, 20 Nov 2009 20:51:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.brandonsavage.net/?p=1032#comment-2166</guid>
		<description>Brandon,

Thanks for the post - I&#039;ve often wondered the difference between certain MySQL data types, and this writeup did a nice job of explaining them.</description>
		<content:encoded><![CDATA[<p>Brandon,</p>
<p>Thanks for the post &#8211; I&#8217;ve often wondered the difference between certain MySQL data types, and this writeup did a nice job of explaining them.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ryan</title>
		<link>http://www.brandonsavage.net/designing-databases-picking-the-right-data-types/#comment-2164</link>
		<dc:creator>Ryan</dc:creator>
		<pubDate>Fri, 20 Nov 2009 16:19:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.brandonsavage.net/?p=1032#comment-2164</guid>
		<description>Database-enforced constraints (like enum values) are, quite frankly, safer. When the database itself does the constraining, you (or another programmer) won&#039;t inadvertently screw up data with bad code.

An acceptable compromise, I think, is to enforce enum values in a db layer. If the db layer knows it&#039;s an &#039;enum&#039; and what the accepted values are then no one can mess up the data*.

(Doctrine ORM defaults to simulated enums with varchar columns as not every DBMS supports enums).

* So then you run into possible issues when/if you do any raw SQL on your database (from CLI, scripts, whatever).</description>
		<content:encoded><![CDATA[<p>Database-enforced constraints (like enum values) are, quite frankly, safer. When the database itself does the constraining, you (or another programmer) won&#8217;t inadvertently screw up data with bad code.</p>
<p>An acceptable compromise, I think, is to enforce enum values in a db layer. If the db layer knows it&#8217;s an &#8216;enum&#8217; and what the accepted values are then no one can mess up the data*.</p>
<p>(Doctrine ORM defaults to simulated enums with varchar columns as not every DBMS supports enums).</p>
<p>* So then you run into possible issues when/if you do any raw SQL on your database (from CLI, scripts, whatever).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jon</title>
		<link>http://www.brandonsavage.net/designing-databases-picking-the-right-data-types/#comment-2163</link>
		<dc:creator>Jon</dc:creator>
		<pubDate>Fri, 20 Nov 2009 15:44:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.brandonsavage.net/?p=1032#comment-2163</guid>
		<description>I think the bigger problem here is, Brandon, you&#039;re trying to tell people how to design DB&#039;s.  No offense, but you aren&#039;t a DBA, and DBA&#039;s should be the ones to make those decisions.  Not developers.

When developers design tables and databases, they design them to suit their needs for their application.  This is very bad philosophy to follow.  But if you must create your own tables because you have not a DBA, I would highly recommend that you learn Normalization rules, and get some books on MySQL best practices.

Some of this article clears up some confusion that intermediate developers will encounter.  Some of the document makes assumptions based on &quot;well..he said this so it must be true&quot;.  I think that unless you fully understand the issue and the causes, you shouldn&#039;t be writing articles as a &quot;matter of fact&quot; when it&#039;s a &quot;matter of hearsay&quot;.</description>
		<content:encoded><![CDATA[<p>I think the bigger problem here is, Brandon, you&#8217;re trying to tell people how to design DB&#8217;s.  No offense, but you aren&#8217;t a DBA, and DBA&#8217;s should be the ones to make those decisions.  Not developers.</p>
<p>When developers design tables and databases, they design them to suit their needs for their application.  This is very bad philosophy to follow.  But if you must create your own tables because you have not a DBA, I would highly recommend that you learn Normalization rules, and get some books on MySQL best practices.</p>
<p>Some of this article clears up some confusion that intermediate developers will encounter.  Some of the document makes assumptions based on &#8220;well..he said this so it must be true&#8221;.  I think that unless you fully understand the issue and the causes, you shouldn&#8217;t be writing articles as a &#8220;matter of fact&#8221; when it&#8217;s a &#8220;matter of hearsay&#8221;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ben</title>
		<link>http://www.brandonsavage.net/designing-databases-picking-the-right-data-types/#comment-2161</link>
		<dc:creator>Ben</dc:creator>
		<pubDate>Fri, 20 Nov 2009 12:30:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.brandonsavage.net/?p=1032#comment-2161</guid>
		<description>I don&#039;t claim to be any kind of MySQL or PHP authority, however, removing dependencies between database and application is a basic design principle - surely normalising the enum into a separate table is a better alternative to - presumably - using PHP constants to achieve the same result?

With regards to rebuilding the table - even if this is the case, it&#039;s a single operation, and not something that will affect the overall performance of the database.</description>
		<content:encoded><![CDATA[<p>I don&#8217;t claim to be any kind of MySQL or PHP authority, however, removing dependencies between database and application is a basic design principle &#8211; surely normalising the enum into a separate table is a better alternative to &#8211; presumably &#8211; using PHP constants to achieve the same result?</p>
<p>With regards to rebuilding the table &#8211; even if this is the case, it&#8217;s a single operation, and not something that will affect the overall performance of the database.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: FractalizeR</title>
		<link>http://www.brandonsavage.net/designing-databases-picking-the-right-data-types/#comment-2160</link>
		<dc:creator>FractalizeR</dc:creator>
		<pubDate>Fri, 20 Nov 2009 12:25:07 +0000</pubDate>
		<guid isPermaLink="false">http://www.brandonsavage.net/?p=1032#comment-2160</guid>
		<description>Dear Brandon, with all respect, I think blindly follow respected persons is a bad practice. Those slides you mention date back to 2008. And they are correct, in MySQL 5.0, there WAS a problem with ENUM and SET alerations leading to table restructures. Fast ALTER TABLE, as I understand, was implemented in 5.1 (in 5.0 manual there is no such entry, it appeared in 5.1 only).</description>
		<content:encoded><![CDATA[<p>Dear Brandon, with all respect, I think blindly follow respected persons is a bad practice. Those slides you mention date back to 2008. And they are correct, in MySQL 5.0, there WAS a problem with ENUM and SET alerations leading to table restructures. Fast ALTER TABLE, as I understand, was implemented in 5.1 (in 5.0 manual there is no such entry, it appeared in 5.1 only).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brandon Savage</title>
		<link>http://www.brandonsavage.net/designing-databases-picking-the-right-data-types/#comment-2159</link>
		<dc:creator>Brandon Savage</dc:creator>
		<pubDate>Fri, 20 Nov 2009 12:16:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.brandonsavage.net/?p=1032#comment-2159</guid>
		<description>Ben and FractalizeR, I&#039;m sorry, but when it&#039;s Jay Pipes (who works for MySQL) against you, I&#039;m going to go with him every time. See slide 13 of the Drunken Query Master talk where he talks about having to rebuild the table.

It&#039;s poor schema design, because you&#039;re expecting the database to do what PHP should have done for you.</description>
		<content:encoded><![CDATA[<p>Ben and FractalizeR, I&#8217;m sorry, but when it&#8217;s Jay Pipes (who works for MySQL) against you, I&#8217;m going to go with him every time. See slide 13 of the Drunken Query Master talk where he talks about having to rebuild the table.</p>
<p>It&#8217;s poor schema design, because you&#8217;re expecting the database to do what PHP should have done for you.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ben</title>
		<link>http://www.brandonsavage.net/designing-databases-picking-the-right-data-types/#comment-2158</link>
		<dc:creator>Ben</dc:creator>
		<pubDate>Fri, 20 Nov 2009 10:39:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.brandonsavage.net/?p=1032#comment-2158</guid>
		<description>Personally, I don&#039;t see the problem with using enum - if you have a case where a field can be one of two or three values it keeps your data readable without having to cross reference your code in order to understand exactly what is in a particular row.

If you must forbid the use of enums, the best alternative is to normalise the table, and extract the enum field into a seperate table - that way you can add an extra value easily - and it removes dependency between your database and your code.

Regardless, I dissagree with you on the inconvenience of adding an extra enum option at a later stage. It is a single operation, which, even if it takes several minutes, you can make the deployment at a time when the server isn&#039;t busy - if you must deploy to a live database. Surely long term maintainability is more of a consideration than the length of time it will take to perform a single deployment?</description>
		<content:encoded><![CDATA[<p>Personally, I don&#8217;t see the problem with using enum &#8211; if you have a case where a field can be one of two or three values it keeps your data readable without having to cross reference your code in order to understand exactly what is in a particular row.</p>
<p>If you must forbid the use of enums, the best alternative is to normalise the table, and extract the enum field into a seperate table &#8211; that way you can add an extra value easily &#8211; and it removes dependency between your database and your code.</p>
<p>Regardless, I dissagree with you on the inconvenience of adding an extra enum option at a later stage. It is a single operation, which, even if it takes several minutes, you can make the deployment at a time when the server isn&#8217;t busy &#8211; if you must deploy to a live database. Surely long term maintainability is more of a consideration than the length of time it will take to perform a single deployment?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: FractalizeR</title>
		<link>http://www.brandonsavage.net/designing-databases-picking-the-right-data-types/#comment-2157</link>
		<dc:creator>FractalizeR</dc:creator>
		<pubDate>Fri, 20 Nov 2009 09:36:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.brandonsavage.net/?p=1032#comment-2157</guid>
		<description>&gt;If you ever decide you have to add something else to the ENUM or SET declaration, MySQL must rebuild the entire table which, if you have 3 million records, could take you a week or two (hyperbole). This is clearly not optimal.

This is quite incorrect. If you add new values TO THE END of the ENUM or SET declaration, table rebuild is not needed: http://dev.mysql.com/doc/refman/5.1/en/alter-table.html (see fast table alterations).</description>
		<content:encoded><![CDATA[<p>&gt;If you ever decide you have to add something else to the ENUM or SET declaration, MySQL must rebuild the entire table which, if you have 3 million records, could take you a week or two (hyperbole). This is clearly not optimal.</p>
<p>This is quite incorrect. If you add new values TO THE END of the ENUM or SET declaration, table rebuild is not needed: <a href="http://dev.mysql.com/doc/refman/5.1/en/alter-table.html" rel="nofollow">http://dev.mysql.com/doc/refman/5.1/en/alter-table.html</a> (see fast table alterations).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dominik</title>
		<link>http://www.brandonsavage.net/designing-databases-picking-the-right-data-types/#comment-2155</link>
		<dc:creator>Dominik</dc:creator>
		<pubDate>Fri, 20 Nov 2009 08:57:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.brandonsavage.net/?p=1032#comment-2155</guid>
		<description>One thing that I don&#039;t understand - yet - is, why there are no &quot;hexdata&quot; or &quot;uuid&quot; fields. For example you mention md5 hashes and say that they are always CHAR(32). Sadly, that&#039;s true, because it&#039;s the only existing way to store them at this moment.

However actually md5 is a sequence of 16 &quot;byte size&quot; hex numbers, which could be stored in a 16 byte long &quot;BSOB&quot; (as in &quot;Binary SMALL Object&quot; :) - or &quot;BIN&quot; or &quot;HEXDATA&quot; or whatever you might want to call it). Same applies to UUIDs (which are also 16 bytes, just having some dashes in the hex representation).

Why is there no data type to put those in?</description>
		<content:encoded><![CDATA[<p>One thing that I don&#8217;t understand &#8211; yet &#8211; is, why there are no &#8220;hexdata&#8221; or &#8220;uuid&#8221; fields. For example you mention md5 hashes and say that they are always CHAR(32). Sadly, that&#8217;s true, because it&#8217;s the only existing way to store them at this moment.</p>
<p>However actually md5 is a sequence of 16 &#8220;byte size&#8221; hex numbers, which could be stored in a 16 byte long &#8220;BSOB&#8221; (as in &#8220;Binary SMALL Object&#8221; :) &#8211; or &#8220;BIN&#8221; or &#8220;HEXDATA&#8221; or whatever you might want to call it). Same applies to UUIDs (which are also 16 bytes, just having some dashes in the hex representation).</p>
<p>Why is there no data type to put those in?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk: basic (Feed is rejected)
Page Caching using disk: enhanced (User agent is rejected)
Database Caching 4/11 queries in 0.004 seconds using disk: basic
Content Delivery Network via Amazon Web Services: S3: files.brandonsavage.net.s3.amazonaws.com

Served from: www.brandonsavage.net @ 2012-02-07 05:06:42 -->
