How to validate email addresses

Written by
Published on

Validating data is key to ensuring both a good user experience and for keeping your databases clean. And while the HTML standard has more recently integrated its own browser based validation for such things as numbers and emails, often times you will want a custom approach to go along with your brand and your theme.

In this article, I will go over all of the ways in which you can validate emails as well as the built-in HTML methods that browsers provide. In fact, let's start there, as it is the simplest to integrate.

Browser based validation

The HTML specification allows for input fields to be set to an 'email' type.

<input type='email' />

When the form is submitted, if the browser detects that the field isn't a properly formatted email address then a notification will spring up to inform the user.

While this method works just fine for standard workflows (i.e. user fills in the form -> user hits enter -> done), it does not provide any form of flexibility. You essentially get the same pop-up window and same notification message regardless of the issue.

And it is up to the browser vendor to implement this for us. The above screenshot was from a Firefox session for example. On Chrome, the message would look something like the following:

Definitely more helpful. But again, the design and styling is left up to the browser.

Regex approach

Regular expressions are ideal for this type of pattern matching scenario. What are regular expressions? Essentially they are string patterns that are used to define search patterns in other strings.

There are plenty of expressions that you will find online that aim to validate emails. This particular one from https://emailregex.com/ has a 99.99% match rate.

/^(([^<>()\[\]\\.,;:\s@"]+(\.[^<>()\[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/

A few things to note. Notice that regular expressions are not pretty to read. They are indeed complex sequences which makes them slightly more error prone than other alternatives.

Also notice how there is only a 99.99% success rate in matching patterns. And that's because regular expressions for emails aren't exactly perfect in a sense. The email standard is not a static document. While in the past you mainly had a username@url.topleveldomain sequence, these days, things are kind of weird. For one, top-level domains like .com and .net used to be the only game in town. But these days, you can find niche and trendy TLD's like .museum and .party.

And we can assume that this will continue to change going forward. So while we can't have 100% regex supremacy, this is still a relatively valid approach to consider.

Custom validation

And lastly, you have the option of parsing out your own strings and creating your own rules for whats an email valid. Let's take a look at a quick example of a standard email.

username @ url . com

There are essentially 5 main parts that comprise an email address.

1. The username portion (local name)
2. An @ symbol
3. The domain name
4. A (.) symbol
5. The domain extension

If you can ensure that these 5 elements are in place, then you have a good place to start for validation.

Here is a C# example that would do just that.

        public static bool EmailIsValid(string strEmail)
        {
            if (strEmail.IndexOf('@') == -1 || strEmail.IndexOf('.') == -1)
                return false;

            string strPrefix = string.Empty;
            string strSuffix = string.Empty;
            string strExtension = string.Empty;

            strPrefix = strEmail.Substring(0, strEmail.IndexOf('@'));
            strSuffix = strEmail.Substring(strEmail.IndexOf('@') + 1);
            strExtension = strSuffix.Substring(strSuffix.IndexOf('.') + 1);

            if (strPrefix.Length == 0 || strSuffix.Length == 0 || strExtension.Length == 0)
                return false;

            return true;
        }

The code essentially relies on finding substrings of characters in order from @ to . and ensuring that there is some content in each of the primary areas.

This method is not perfect mind you. There is alot more that goes into the email specification that is not taken into account, such as valid characters and maximum allowable characters. But it is a starting point if you wish to add to it and make it more robust.

Ideally, you want to validate emails both on the client-side (through JavaScript) and on the server-side as well. This ensures that older browsers that do not have built-in email validation do not effect the data on your server.

The biggest benefit of having your own custom validation is that you are kept in control of the design. If you have a particular UI/UX theme on your web forms, then you won't have to rely on the built-in popup balloon with a static message.

Having any type of validation, whether custom or built-in, is pretty much mandatory these days. With millions of bots crawling the web daily, the last thing that you want is to wake up with a database full of garbage data that you will then have to spend hours cleaning up.

Leave a comment

No messages posted yet
Walter Guevara is a software engineer, startup founder and currently teaches programming for a coding bootcamp. He is currently building things that don't yet exist.

New articles published each week. Sign up for my newsletter and stay up to date.

Add a comment

Send me your weekly newsletter filled with awesome ideas
Post