For one of our longtime customers we maintain a mailinglist of 20.000+ subscribers.
Running a list that size sets you back $140 per month at Mailchimp which is not an ideal option for a small organisation. There might be other options but back in 2002 it wasn’t illogical to custom build something for a list this size and we maintained it since.
Part of it was a subscribers database with a web-interface to do the usual (un)subscribing. A link to your account is also in the personalised mail you will receive.
Personalised mails means that each outgoing mail is different. We also want to know if the mail address is valid and actually accepted the newsletter when delivered.
Early on we therefore decided to script the delivery using Python and its SMTPlib. That way we can monitor the transaction and store the relevant info for later processing and reporting.
When doing SMTP delivery on your own one of the first steps is finding the preferred MX record for the address and connect to it. With 20K+ email addresses it turns out there are roughly 10K of different domains in this specific dataset.
Every MX record lookup is a DNS request and before you can actually connect to the mailserver you need to know it’s IP address which is another lookup.
We found out that the process of sending out the actual mail would be a lot speedier if we know the IP of the server to connect to upfront. Therefore a long time I wrote a script which gets all the email addresses, process it to just the domains and do the lookups.
As delivery is done with Python this preprocessor was also written in Python. The script does not try to be very smart, just do the DNS lookups (MX and A record) for all the domains and write a textfile containing domain and found IP.
No threads, just sequentially do all the work. This takes a while, quite a while:
$ time python mx_domains.py0
10.594u 0.794s 1:42:15.81 0.1% 1572+1228k 56+10282io 55pf+0w
That’s just a bit over 100 minutes of sheer waiting on DNS lookups. Instead of updating the Python code to an async multithreaded model I decided to give Go a go. After all wasn’t that supposed to be the ideal tool for this job with its goroutines and Devops oriented mindset?
It wasn’t the first time I looked at the language but it was the first time that I actually tried to write something useful in it. Just because I was curious if it lived up to its image of being the cool new kid in programming land.
After a few hours of trying, reading and pondering I ended up with this code:
package main
import "fmt"
import "net"
import "io/ioutil"
import "strings"
import "sync"
import "github.com/mr51m0n/gorc"
var gorc0 gorc.Gorc
func lookup(domain string, wg *sync.WaitGroup) {
addr, status := net.LookupMX(domain)
if status == nil {
ip,s := net.LookupHost(addr[0].Host)
if s == nil {
fmt.Printf("%s %vn", domain, ip[0])
} else {
fmt.Printf("%s %vn", domain, addr[0].Host)
}
} else {
fmt.Printf("##%sn", status)
}
wg.Done()
}
func main() {
content, err := ioutil.ReadFile("domains.txt")
if err != nil {
fmt.Println("input file not found")
return
}
domains := strings.Split(string(content), "n")
var wg sync.WaitGroup
for _, element := range domains {
wg.Add(1)
gorc0.Inc()
go lookup(element, &wg)
gorc0.WaitLow(50)
}
wg.Wait()
fmt.Println("DONE")
}
Early versions of this program had no limit on the number of goroutines started and basically were a Denial of Service attack on the DNS server 🙂
Surprised that there is no built-in limiter I found github.com/mr51m0n/gorc and after some tweaking it was established that 50 goroutines running does not overload the Google DNS. Running it gives:
$ time go run mx.go
DONE, 10279 domains processed
5.523u 4.334s 4:46.62 3.4% 1150+387k 0+21io 0pf+0w
The difference in runtime is staggering, less than 5 minutes! Obviously the Python code could be made much faster by also using coroutines, haven’t tried it. Comparing this number to the single threaded sequential Python version is absolutely not fair. However I am very impressed with the terseness of the Go code. And perhaps even more with the ease of my mind adapting to it.
Is there an old C programmer hiding in me trying to get out? It wouldn’t surprise me at all that future projects get some Go mixed in next to the Python..