I spend a good amount of time using AWS. For the most part I love the power that the platform puts into your hands.
With that said, it could be better.
“Over 200 services not enough for you? You’re never going to be happy,” I hear you cry.
To that I reply, that’s probably true. Also, my ticket to re:Invent was expensive this year, so I feel like AWS should come prepared.
“Prepared with what?”
Well funny you should ask. You see, I have prepared a wishlist that I have whittled down to ten items. So if you are an engineer or product manager at AWS and you dream of being a genie, here is how you make my AWS dreams come true this November.
Without further ado, I represent the first entry in my blog-series countdown:
#10: Long running lambda functions
I really really love lambda functions. They let me do most of what I want so long as I can make it happen in 15 minutes. Unfortunately, many of the things I want to do take longer than 15 minutes.
“What could take longer than 15 minutes?” you might ask. And that question might be because you’re used to writing web-apps with request / response times well under a second. Or because you really like batching and parallelizing computational work.
If you are in the latter group, I can only say that I don’t share your tastes. Maybe it’s just because I have too much to do, but I like to optimize as late in the game as I possibly can. If you are in the former group, here is an incomplete list of where some of us more backend-focused engineers spend our time.
Database cloning tasks
database seeding tasks
web crawling tasks
media transcoding tasks
ML model building tasks
infrastructure provisioning tasks
data analytics tasks.
data synchronization tasks
But if you’re a know-it-all like me, you might be thinking to yourself, “What about Fargate? You can do all of this with Fargate!” Or perhaps, “I don’t need lambda to transcode my video! I have Elemental MediaConvert for that!” Or, “I don’t need my lambda function to build my ML model! I have SageMaker for that!” Or one of any number of other valid points.
Here’s the thing though: AWS Lambda integrations with AWS services are really really good! Especially Step Functions. If I invoke the ECS RunTask API from StepFunctions (Something I do quite a lot, I might add), its rarely as simple as passing JSON directly.
Either using environment variable overrides or command overrides, or a command override plus some API gymnastics in my task to hydrate references. I also have to handle returning response or error data back to my state machine.
This experience results in monotonous work that needs to be accurate. Not my cup of tea. (I imagine good tooling could help with this pain-point. I’m watching the functionless framework closely here.)
But frameworks will not be able to solve performance issues. You see, lambdas start up within a couple of seconds, while it can take a minute or two for Step Functions to kick off a Fargate task.
Observability is also easier with lambda functions. You can click straight through from step functions to lambda execution logs, while Fargate tasks which stop will only linger for an hour. If you want to click through you’d better do so quickly. I concede that you can hook up event bridge to log stopped task details. But then you need to build your own user experience on top of what AWS gives you. I don’t have time for that.
An even larger benefit to this proposal there is a whole bunch of great tooling which is built to deploy lambda functions. Serverless Framework, SAM, Chalice, etc. I can’t use that with Fargate or Step Functions Activities. Why shouldn’t you be able to deploy long running tasks to AWS with these great frameworks?
All in all, there are lots of reasons to make Lambda Functions last longer. But maybe the biggest and best reason is that the competition is already doing this. That’s right, long running functions is one of the few things Azure does better than AWS. That’s right, if you provision a function app on Azure with the right plan, you can run it all the time without needing to timeout functions.
Is there a good reason that we need to timeout lambda functions 15 minutes? Probably. Maybe tying this feature to provisioned concurrency lambda functions is necessary to ensure that the lambda runtime is stable from a work schedule standpoint. Either way, AWS engineers are smart, and implementing this feature would make this lambda user’s AWS wishes a reality.