This post is one in a series of posts that are sample chapters from my FREE eBook 20 Things You Can DoTo Save Money On Your Amazon Web Services Bill Today. Feel free to download the full eBook for free if you want the whole thing.
Bottom line: If you have HTTP requests to your restful web server taking longer than 5 seconds, then you really should be doing some of that processing in a worker somewhere instead of in the web server (I might even argue 1 second).
Chances are if you identify the web requests that are the biggest outliers taking the longest to run and consuming the most CPU/Memory, then move them out of your web servers to some type of worker process, you can drop the CPU/Memory requirements your web server needs to run and save a lot of money.
One of the biggest offenders to this is often reports that compile lots of data, so I suggest looking there first.
You might have a solution where you run something like Celery (https://docs.celeryq.dev/) that is always running, listening to a data source that acts like a queue and tells it when to consume tasks. Unless there is a fairly constant load being passed into the queue, this is probably inefficient. This is because when there is no load being passed in, you are still paying for provisioned compute resources.
Let’s say you wanted to save some money so you under-provision your queue a bit to be optimized for your average load (this is NOT a recommendation). Then, when you send a big task that needs more compute resources through the queue, it ends up blocking the queue or causing the queue to crash. Both would result in a bottleneck that would slow down or stop the other tasks from processing.
You could split the queue into multiple queues so tasks with larger resource requirements have their own queue and even compute resources. So, one queue is for small, fast tasks, and one queue is for big, long tasks. This would solve the blocking problem for the small, fast tasks, but unless you have a very consistent flow of the big tasks 24/7, you will have times when you are paying for compute resources you may not need and wasting money.
With a worker like Celery, you can put autoscaling on the workers. The problem is with long-running tasks. When ECS decides to shut down a task because it is no longer needed, it sends a TERM signal to the task right away. A worker like Celery should then try to do a “Graceful Shutdown.” This means it should try to finish the job it is currently processing and then shut down its process, which will allow ECS to finish killing the unneeded ECS Task.
The problem is that after ECS sends that TERM signal, it then waits a certain amount of time, after which it sends a KILL signal and kills the task anyway. So if your job running in the worker takes longer than the duration ECS is willing to wait, then your task will get killed mid-run, and you risk data corruption.
Right now, the max ECS is willing to wait is 120 seconds (This is not the default). My rule is not to use this technique with jobs that often run longer than 60 seconds. But if you have a consistent flow of jobs that take less than 60 seconds, I would strongly consider this option.
This option is more cost-efficient but also will take longer for jobs sent through to process as there is a boot-up time, but it will work with ECS Workers and a tool like Celery. You will need to use something like AWS Simple Queue Service (SQS), which can enable AWS to put autoscaling rules on.
You would set up a worker ECS service that has one worker running. You would then set an autoscaling event that checks to see how many messages are currently in the queue. If the message count is at zero, you could scale down to one task.
If you wanted to go extreme, you would set up a worker ECS service that has 0 workers running by default and therefore be costing you nothing. Then, set a rule for it to boot up only when there are one or more messages in the queue. Just make sure you are comfortable with a fairly slow 30-60 seconds boot-up time. Option 2 - AWS Lambda:
I don't want to sound like a Lambda fanboy here. There are plenty of occasions when Lambdas are not the right tool for the job, but they might just be the right tool if you are only processing sporadic traffic. If you do not have a baseline of messages getting queued up for processing, AWS Lambda is perfect. It will scale up faster than the Always On Auto Scaling option and scale down just as quickly.
At the time of writing this, they timed out at around 900 seconds (15 minutes), so there are limitations. If that is a limitation, you will want to check out the next section on AWS Batch.
Batch is the big butch solution to run longer running/beefier compute loads. AWS Batch can even handle GPU compute loads (see links below for more details). AWS Batch is its own queueing system and only boots up ECS Tasks when there are “Jobs” in the queue to be worked on. That way, like the Lambda solution, you are only getting charged for when a Job is actively utilizing CPU/Memory Resources, so the total time billed is typically the amount of time the job runs plus about 60 seconds to account for start-up and shutdown.
The duration of the task is a key factor in deciding what infrastructure is best to run a task on. If it's something that needs to be kicked off within a few milliseconds of being put in the queue, an “Always on Solution” might be needed. If it can wait a second or two, then the Lambda is also an option.
Not all tasks are the same, nor do they have the same compute/memory needs or run duration.
You can have a queue for the fast-running tasks on Lambda and then send all of our longer-running tasks to AWS Batch. Get creative with it.
Pricing models like Fargate Spot are a great way to save even more money on your compute resources without needing to commit to long-term minimums. It basically is a way to save money by renting virtual compute power when the customers who are paying top dollar are not using it. The savings can be 30% or more.
This is great for background tasks that are not super time-sensitive to run. Be careful, though, when you write your code, as it could get terminated mid-job with about 2 minutes’ notice for it to gracefully shut down. I suggest implementing some type of DB transaction that will roll back everything if it gets killed. Additionally, see the Advanced Section on Cursors/Checkpoints to make your workers more fault tolerant.
AWS Batch Pricing (there are no additional charges, but see for yourself): https://aws.amazon.com/batch/pricing/ AWS Fargate Spot Pricing: https://aws.amazon.com/fargate/pricing/ AWS SQS: https://aws.amazon.com/sqs/pricing/
Signup for the mailing list