This is the final post in my count-down of items on my 2022 re:invent wishlist. You can the previous post here.
It is November 10th tonight and Re:Invent is arriving pretty darn soon. Because of the time-crunch, and frankly because I have more interesting things to write about than missing AWS features, I am finishing my Re:Invent wishlist here and now. In one blockbuster BuzzFeed style list.
So without further ado, here are the top 6 items I want most this Re:Invent:
#6 - Cognito as a SAML Identity Provider
In a typical authentication workflow with AWS Cognito, an application manages its users in Cognito user pools. If an organization wishes to allow all of its users to access the application, they establish a SAML or OIDC connection with Cognito whereby users can log in with their organization’s identity provider.
Cognito does not currently support a reverse workflow. Let’s say I have built a website with Cognito and have a bunch of users loaded up into my user pools. Now, I want to be able to enable a set of these users to access a my tableau analytics dashboard using their regular application login credentials.
Would it not be awesome if you could establish a SAML trust between Tableau and Cognito?
Imagine if a user who attempts to access Tableau could automatically be redirected to their regular application login screen. Imagine if your application’s user pool users could use their application credentials to gain access to your Jira boards and see your internal roadmaps.
This re:Invent, AWS, please would you transform Cognito user pools into powerful directories that are capable of serving as an identity provider to any number of third party apps that support SAML authentication and application assignments.
#5 - Scale to Zero for “Serverless” Resources
Until AWS released Serverless Neptune a couple of weeks ago, I was pretty bullish about this wishlist item.
Only a couple of years ago, I was giddy with excitement at the announcement of Aurora Serverless. Today, I am a customer of Aurora Serverless, but much of my initial excitement has waned.
Perhaps I had unrealistic expectations. After all, I have been using Lambda since 2016, DynamoDB since 2015, and S3 since college in 2014. DynamoDB, S3, and Lambda, effectively offered me the ability to experiment as much as I wanted for free without the fear of a suddenly large AWS bill I couldn’t afford. I’d never built a VPC back then.
In my head Aurora Serverless would be just like DynamoDB, except I could model data relationally and do joins too, and connect it to ActiveRecord or Hibernate or SQL Alchemy. Plus they’d announced this thing called the Data API, where I could perform SQL queries over HTTP and didn’t need to worry about connection pooling.
Since then, I’ve used Aurora Serverless in my workloads, but don’t use it to experiment much because it hasn’t lived up to its promise. Even while idle, it charges you money, and the VPC dependencies make it feel a lot more cumbersome than DynamoDB.
Skip a couple of years, and while Aurora Serverless has improved tremendously across a set of dimensions, it feels even less serverless. Aurora Serverless V2 isn’t highly available out of the gate. You have to deploy a cluster and then serverless instances in multiple availability zones in your VPC. It kind of just feels like an autoscaling RDS cluster without the serverless promises of scaling to zero. The Data API has also gone by the wayside.
AWS Neptune’s serverless solution was recently released with a minimum of 2.5 NCUs. Once more, autoscaling, but over $289 / month to run. Its operational model feels very similar to Aurora Serverless V2.
Although AWS calls these two services serverless, they only seem to meet two of the four serverless criteria AWS describes in its serverless FAQs:
Q: What makes a service or application serverless?
We founded the concept of serverless on the following tenets: no server management, pay-for-value services, continuous scaling, and built-in fault tolerance. When adopting a serverless service or building a serverless architecture, these ideals are fundamental to serverless strategy.
With the new Aurora Serverless and Neptune Serverless offerings, the tenets of no server management and continuous scaling hold true. The tenets of pay-for-value and built-in-fault tolerance do not apply the same way.
I would love to see these two “Serverless” offerings scale down to zero and handle configuration of high availability out of the box.
#4 - Availability Zone Equality
One of the most frustrating things to experience when building on AWS is what I call the availability zone equality problem. It goes like this:
You plan your network architecture and create a VPC and subnets, usually in randomly selected availability zones.
Some time later, you elect to deploy AWS Workspaces in your VPC and discover that your chosen subnets invariably do not support AWS Workspaces because they’re in the wrong AZ. You see, it turns out that AWS Workspaces only supports a subset of availability zones in each region.
Why is Workspaces only available in a subset of AZs? Who can say.
AWS Workspaces, however, isn’t the best culprit of the availability zone equality problem. Amazon Nimble Studio also documents this behavior.
Worse still, some limitations are not documented. Byron Wolfman of Hashicorp writes about the absence of Nitro EC2 instances in availability zone use1-az3. EKS sometimes errors out with:
Cannot create cluster 'example-cluster' because region-1d, the targeted Availability Zone, does not currently have sufficient capacity to support the cluster. Retry and choose from these Availability Zones: region-1a, region-1b, region-1c
I’ve seen OpenSearch and Amazon RDS clusters sometimes fail with similar errors too.
This re:Invent, if AWS shows progress toward making all services and instance types availability across all availability zones in supported regions, I will be a happy man.
#3 - AWS Multi-account Console
While administering an AWS environment securely is made a lot easier with multiple accounts, multi-account strategies are not without pain-points. One of the pain-points of building multi-account workloads is that its impossible to be logged into multiple AWS accounts at the same time.
AWS’s current solution to this dilemma is AWS SSO which lets you switch between AWS accounts or roles in your organization easily. While an improvement over the old role switching experience, I still find this a frustrating experience.
A common use case is trying to debug an ECS task that’s using firelens to ship all logs to cloud-watch in a log archive AWS account. I have to either open separate chrome profiles or keep switching between accounts as I want to read the logs and debug the ECS task configuration.
Another use case I often encounter is trying to debug network configuration issues that span across multiple accounts.
Item #3 on my re:Invent wishlist for this year is a single console experience. This experience should allow me to view resource configurations across accounts and regions in a centralized place, and search and filter these resources to help identify misconfigurations more quickly.
#2 - ECS Stateful Sets
There is currently no native integration that enables you to mount EBS volumes to ECS tasks to enable persistent volumes to support an application. For more information re: why this is would be awesome, see this github issue which has been open since January of 2019.
Stateful services with low latency requirements are one of the few use-cases where I recommend that customers use EKS instead of ECS.
#1 - Serverless Open Search
I am a huge fan of OpenSearch. It is a real pain-point, however, to operate an OpenSearch cluster. You run out of shards on your nodes, or your performance starts suffering, or you run out of hard-drive space. Or your data isn’t refreshing quickly enough. Or your AWS bill skyrockets. Or your single node falls over because you didn’t configure things according to best practices.
It requires time and domain knowledge to understand how to size clusters and allocate resources for OpenSearch. A serverless solution you can use to search and index documents, to help remove some of the domain knowledge necessary to operate the infrastructure reliably and cost-effectively would be absolutely awesome.
And please, AWS, no minimum of 2.5 Open Search Capacity Units for billing purposes.
One last thing…
I’m going to Re:Invent for the first time this year. If you’ll be there too and would like to connect, my Twitter DMs are open. Alternatively feel free to reach out on LinkedIn.