Just recently I spent a few hours attempting to make an overly complex SQL query into a reality. From the beginning, it didn't really feel right. The data was slightly off, and there was this nagging feeling hovering that it wasn't going to end well. It wasn't that the query itself was complex. But more that the data wasn't very well stored, formatted and queryable. But nonetheless, because I began to solve it with a data-based mentality and because I had dedicated so much time on a query that would take years to describe I decided to continue the relentless battle. At the end, I ended up with a query that would be forgotten within a 15 minute timespan and with a dataset that might or might not be correct. There was very little I could do to verify whether a 2 paragraph query was valid or not.
code to the rescue
And then, when all seemed lost, I had a realization. It was a subtle realization, but an important one. And I had to go through this losing battle in order to see it. I know how to program (I'm pretty sure). I know how to read from a database and how to write to a database using code. In code I can do whatever I wish. I can read each record, and take each field, and process each field as I need to, all within a couple of lines.
It's just that it took me a while to realize this. So today I'm going to go over that timeless battle of SQL vs Code. And when to use what when, and why.
When to use SQL
SQL is amazing. You can get instant data exactly how you want it in under a second. And you can refine that dataset further and further with a single line. If your data is normalized well, referential integrity is in place and you can query without using many of the built in string manipulation functions, such as SUBSTRING or CHARINDEX, then you should use SQL.
Good database normalization
But again, the main problem isn't the SQL query per say. But more the way that your data is stored. If you're having to use string manipulation functions (as I did) in order to query and filter data, then your data isn't as broken down as it could be. String manipulation functions in SQL have their use and place, such as reporting. If you need to dynamically create a URL or maybe locate just a portion of an identifier to determine another variable for example, then for sure SQL is the route to go. But if you want to find out which cities are the most popular in your database, but you need to figure it out based on the following:
||123 fake street, CA
||234 another street, California
||456 main st, Cali
If this is the case, then you are going to have a bad time. And the only recommendation I would have here, is to transform the data into something more workable. But not to spend any considerable amount of time attempting to extract data just for a query.
For large data sets
If an operation takes a second to complete, it's not that much time generally. Unless we're talking satellite communication, in which, yes..yes it is alot of time. But also, if we're talking a million records, then also, this is alot of time. This has happened to me a few times, and normally I was unaware of the time to run. Until after several hours of waiting, I did a simple multiplication just to see that the operation would have completed in several weeks.
SQL will always be much faster than code. Indexing algorithms are there for a reason and if your database tables are well indexed, something that can take a snippet of code hours, could be done in minutes. So before you leave SQL behind for a function, do a single operation in order to get an idea of the time to complete.
When to code
When you code with data, you're essentially still using SQL. The only difference, is that you won't be processing and filtering the data using SQL. You'll be using custom and native functions in order to get a desired effect. Some things are not simple to do with SQL, but they are a breeze in code. For example, one of the records that I was working with recently had a particular identifier that was needed in order to determine other records in that set. That identifier was mixed in with a longer token which was a string.
SQL server does indeed give us the ability to retrieve Substrings, however, it needs to know the length of said substring. Of which we can use CharIndex along with some clever math. Doable, but this will require us to double or even triple the size of our queries depending on whether we're involving subqueries or grouping.
When you can complete it in less than a day
As I mentioned above, calculate how much time it takes for 1 single operation. This is huge and will determine whether it is feasible at all to continue. Then make the determination whether the time to complete outweighs the time to write your code.
When you know how to do it in code
Just because you can't figure something out in SQL does not mean that you are a terrible programmer or DBA, or analyst, or whichever other title deals with data. It just means that you went down a road that got darker and darker, more full of brush and fallen tree bridges and that you're too far in to go back. To go back and to use the 21st century bridge that's right next door. Sometimes, it just clicks in code. You want this part of a string, and you want to use it to get this other part. One quick substring, followed by a second query is all that's required.
When accuracy matters
The granular control that you get with code let's you get a much higher overview of your data. You can store as much data as you need in variables (nowadays) and then use that same data for other queries or calculations. The important part really is that it makes sense to you. That you can get 10,000 records and know that those 10,000 records are exactly what you're looking for.
Weigh your options
There is no direct right or wrong way to retrieve data. The right way is the way that's easier for you to accomplish, gets accurate results, and that will make sense tomorrow when you look at it again. The results will always be the same, if done correctly.
SELECT * FROM Post WHERE State = 'CA';
function GetPostFromState(string state)
var posts = GetPosts();
foreach(var post in posts)
if (post.State == 'CA')
// store data