I work for a startup based in Paris, called tinyclues. For our daily processes, we use AWS EC2 and when possible Spot Instances. I am surprized to see that you need to be logged in to the AWS console to follow the evolution of the spot prices.
Thus, as a week-end project, I decided to build a web-app that will collect and display the history of the spot prices. This geeky weekend second objective was also to approach unknown technologies/services.
Quickly, I understood that I needed to adopt the following constraints: the service should
Furthermore, as I had just returned from pycon where I met people from cloundant and mongoHQ, I wanted to try one of these DaaS.
Spot prices are available via the AWS API. As a python developer, I use boto
(website) to grab those prices.
The following code is used to collect them.
from datetime import datetime
from boto import ec2
ISO8601 = '%Y-%m-%dT%H:%M:%S.000Z'
def fetch_spot_prices(aws_access_key, aws_secret_key, start_time):
"""
Grab spot prices for every available AWS region.
`start_time` parameter should be provided with the ISO 8061 format.
"""
end_time = datetime.utcnow().strftime(ISO8601)
regions = ec2.regions(aws_access_key_id=aws_access_key,
aws_secret_access_key=aws_secret_key)
connect = lambda region: ec2.connect_to_region(region,
aws_access_key_id=aws_access_key,
aws_secret_access_key=aws_secret_key)
prices = []
for region in regions:
prices.extend([x for x in connect(region.name).get_spot_price_history(start_time=start_time, end_time=end_time)])
return prices
Unfortunately, the start_time
parameter is not taken into account (either because of a bug in boto
or in the AWS API) resulting in the need to filter the results of the get_spot_price_history
method.
prices.extend([x for x in connect(region.name).get_spot_price_history(start_time=start_time, end_time=end_time))
becomes
prices.extend([x for x in connect(region.name).get_spot_price_history(start_time=start_time,
end_time=end_time) if x.timestamp > start_time])
Once retrieved, those price entries should be recorded. As I already said, I wanted to use MongoHQ (and thus MongoDB) or Cloudant (and thus couchdb) as a Database provider. MongoHQ pricing is less progressive and is tightly bound to the amount of data you want to store. As I have no plan to reduce the amount of stored data and the cost per Gb is very low on cloudant, I decided to go cloudant. Disclaimer: at this time, my choices were entirely guided by the cost of those services. As of yet, I have not looked into a MongoDB vs CouchDB features comparison. I know this may sound strange, but low cost was a strong objective for this project.
For this reason I created an account on cloudant platform and setup an awsspotprices
database.
In order to write on this database, I needed to generate an access key (with the corresponding secret key).
Once done, I added a function to record the returned prices list of the fetch_spot_prices
method on my cloudant database.
from couchdbkit import Document, Server, StringProperty, FloatProperty, DateTimeProperty
from restkit import BasicAuth
class SpotPriceEntry(Document):
availability_zone = StringProperty()
instance_type = StringProperty()
item = StringProperty()
price = FloatProperty()
product_description = StringProperty()
region = StringProperty()
timestamp = DateTimeProperty()
def record_prices(prices, cloudant_db, cloudant_access_key, cloudant_secret_key):
server = Server(cloudant_db, filters=[BasicAuth(cloudant_access_key,
cloudant_secret_key)])
db = server.get_db("awsspotprices")
SpotPriceEntry.set_db(db)
db.save_docs([SpotPriceEntry(availability_zone=price.availability_zone,
instance_type=price.instance_type,
item=price.item,
price=float(price.price),
product_description=price.product_description,
region=price.region.name,
timestamp=parse_ts(price.timestamp)) for price in prices])
Finally, I had to compose the call of my two functions to record aws spot prices. Something like record_prices(fetch_spot_prices(start_time))
.
But I still could not figure out how to get the start time.
Here, I decided to get the most recent record time, accross all instance types and all regions. I suspect this method to be deeply wrong because one does not know how amazon gives its clients pricing history from one region to the next, from one instance type to the other. To get the latest record time, one should inspect the last record time instance type by instance type, region by region.
Computing the max timestamp is done by creating a design document on couchdb.
Couchdb has a special _stats
reduce function that suits my needs. The design document is:
"stats": {
"map": "function(doc) {emit(doc.timestamp, Date.parse(doc.timestamp))}",
"reduce": "_stats"
}
For each document, I emit the timestamp, parsed as an int.
The return document contains:
{"rows":[
{"key":null,"value":{"sum":109702860597366000,"count":80197,"min":1366732939000,"max":1369084205000,"sumsqr":150064472797683882932844000000}}
]}
Note: for the couchdb beginner, it's not easy to understand where to store design documents. They should be stored in a special _design/views
entry where each views should be stored in views key.
Then all I need is a glue to get the start_time
value:
import time
def get_start_time(cloudant_db):
server = Server(cloudant_db, filters=[BasicAuth(cloudant_access_key,
cloudant_secret_key)])
db = server.get_db("awsspotprices")
ts = db.view("views/stats").fetch_raw().json_body['rows'][0]['value']['max']
return time.strftime(ISO8601, time.gmtime(ts/1000))
def record():
start_time = get_start_time(CLOUDANT_DB)
prices = fetch_spot_prices(start_time, AWS_ACCESS_KEY, AWS_SECRET_KEY)
record_prices(prices, CLOUDANT_DB, CLOUDANT_ACCESS_KEY, CLOUDANT_SECRET_KEY)
Last but not least, I needed to run this job periodically. PiCloud offers CRON tasks management (see here).
I should mention here the incredible support of both PiCloud and Cloudant. For instance, I had a problem with couchdbkit
dependencies, and PiCloud support was very responsive and helpful.
At this stage, I had a regularly updated database filled with price entries. Now, all I needed to do was to plot these data.
d3js
is, nowadays, maybe the best library for data visualisations. A new ecosystem of Javascript libraries is built on it.
I chose one of these libraries which was specialized in displaying time series: rickshaw
.
My needs were perfectly suited by this example of the library that can be found on the rickshaw
website.
If you're looking at the awsspotprices website, you'll see that the code is almost identical.
My whole application holds in one HTML page. I serve it from a AWS S3 bucket and use the AWS Route 53 DNS server to make awsspotprices available.
I hope someone will find this application useful. I use it almost everyday to follow AWS pricing history. Here is a pricing estimation of this service:
Finally, with domain name registration, this service will work for less than 2$ per month.
More resources about AWS Spot pricing: