Giles' blog

Creating a time series from existing data in pandas

Posted on 9 May 2017 in Python |

pandas is a high-performance library for data analysis in Python. It's generally excellent, but if you're a beginner or you use it rarely, it can be tricky to find out how to do quite simple things -- the code to do what you want is likely to be very clear once you work it out, but working it out can be relatively hard.

A case in point, which I'm posting here largely so that I can find it again next time I need to do the same thing... I had a list start_times of dictionaries, each of which had (amongst other properties) a timestamp and a value. I wanted to create a pandas time series object to represent those values.

The code to do that is this:

import pandas as pd
series = pd.Series(
    [cs["value"] for cs in start_times],
    index=pd.DatetimeIndex([cs["timestamp"] for cs in start_times])
)

Perfectly clear once you see it, but it did take upwards of 40 Google searches and help from two colleagues with a reasonable amount of pandas experience to work out what it should be.

Parsing website SSL certificates in Python

Posted on 9 December 2016 in Cryptography, Python, PythonAnywhere, TIL deep dives |

A kindly PythonAnywhere user dropped us a line today to point out that StartCom and WoSign's SSL certificates are no longer going to be supported in Chrome, Firefox and Safari. I wanted to email all of our customers who were using certificates provided by those organisations.

We have all of the domains we host stored in a database, and it was surprisingly hard to find out how I could take a PEM-formatted certificate (the normal base-64 encoded stuff surrounded by "BEGIN CERTIFICATE" and "END CERTIFICATE") in a string and find out who issued it.

After much googling, I finally found the right search terms to get to this Stack Overflow post by mhawke, so here's my adaptation of the code:

from OpenSSL import crypto

for domain in domains:
    cert = crypto.load_certificate(crypto.FILETYPE_PEM, domain.cert)
    issuer = cert.get_issuer().CN
    if issuer is None:
        # This happened with a Cloudflare-issued cert
        continue
    if "startcom" in issuer.lower() or "wosign" in issuer.lower():
        # send the user an email

An HTTP request's journey through a platform-as-a-service

Posted on 20 August 2014 in Talks, PythonAnywhere, Python |

I'm definitely getting better as a public speaker :-) At EuroPython in Berlin last month, I gave a high-level introduction to PythonAnywhere's load-balancing system. There's a video up on PyVideo: An HTTP request's journey through a platform-as-a-service. And here are the slides [PDF].

How many Python programmers are there in the world?

Posted on 24 June 2013 in Python, PythonAnywhere |

We've been talking to some people recently who really wanted to know what the potential market size was for PythonAnywhere, our Python Platform-as-a-Service and cloud-based IDE.

There are a bunch of different ways to look at that, but the most obvious starting point is, "how many people are coding Python?" This blog post is an attempt to get some kind of order-of-magnitude number for that.

First things first: Wikipedia has an estimate of 10 million Java developers (though I couldn't find the numbers to back that up on the cited pages) but nothing for Python -- or, indeed, any of the other languages I checked. So nothing there.

A bit of Googling around gets one interesting hit; in this Stack Overflow answer, "Tall Jeff" says that the 2007 version of Learning Python estimated that there were 1 million Python programmers in the world. Using Amazon's "Look inside" feature on the current edition, they still have the same number but for the present day, but let's assume that they were right originally and the number has grown since then. Now, according to the Python wiki, there were 586 people at the 2007 PyCon. According to the front page at PyCon.org, there were 2,500 people at PyCon 2013. So if we take that as a proxy for the growth of the language, we get one guess of the number of Python developers: 4.3 million.

Let's try another metric. Python.org's web statistics are public. Looking at the first five months of this year, and adding up the total downloads, we get:

Month	Downloads
Jan	2,584,754
Feb	2,539,177
Mar	3,182,946
Apr	3,199,012
May	2,855,033

Averaging that over a year gives us 34,466,213 downloads per year. It's worth noting that these are overwhelmingly Windows downloads -- most Linux users are going to be using the versions packaged as part of their distro, and (I think, but correct me if I'm wrong) the same is largely going to be the case on the Mac.

So, 34.5 million downloads. There were ten versions of Python released over the last year, so for let's assume that each developer downloaded each version once and once only; that gives us 3.5 million Python programmers on Windows.

What other data points are there? This job site aggregator's blog post suggests using searches for resumes/CVs as a way of getting numbers. Their suggested search for Python would be

(intitle:resume OR inurl:resume) Python -intitle:jobs -resumes -apply

Being in the UK, where we use "CV" more than we use "resume", I tried this:

(intitle:resume OR inurl:resume OR intitle:cv OR inurl:cv) Python -intitle:jobs -resumes -apply

The results were unfortunately completely useless. 338,000 hits but the only actual CV/resume on the first page was Guido van Rossum's -- everything else was about the OpenCV computer vision library, or about resuming things.

So let's scrap that. What else can we do? Well, taking inspiration (and some raw data) from this excellent blog post about estimating the number of Java programmers in the world, we can do this calculation:

Programmers in the world: 43,000,000 (see the link above for the calculation)
Python developers as per the latest TIOBE ranking: 4.183%, which gives 1,798,690
Python developers as per the latest LangPop.com ranking: 7% (taken by an approximate ratio of the Python score to the sum of the scores of all languages), which gives 2,841,410

OK, so there I'm multiplying one very approximate number of programmers by a "percentage" rating that doesn't claim to be a percentage of programmers using a given language. But this ain't rocket science, I can mix and match units if I want.

The good news is, we're in the same order of magnitude; we've got numbers of 1.8 million, 2.8 million, 3.5 million, and 4.3 million. So, based on some super-unscientific guesswork, I think I can happily say that the number of Python programmers in the world is in the low millions.

What do you think? Are there other ways of working this out that I've missed? Does anyone have (gasp!) hard numbers?

Running Django unit tests on PythonAnywhere

Posted on 21 May 2012 in PythonAnywhere, Django, Python |

I was working on a side project today, a Django app hosted at PythonAnywhere. While writing some initial unit tests, I discovered a confusing bug. When you try to run the tests for your app, you get an error message creating the database (for the avoidance of doubt, USERNAME was my PA username):

18:57 ~/somewhere (master)$ ./manage.py test
Creating test database for alias 'default'...
Got an error creating the test database: (1044, "Access denied for user 'USERNAME'@'%' to database 'test_USERNAME$default'")
Type 'yes' if you would like to try deleting the test database 'test_USERNAME$default', or 'no' to cancel: no
Tests cancelled.

The problem is that PythonAnywhere users don't have the privileges to create the database test_USERNAME$default (whose name Django's unit testing framework has auto-generated from the USERNAME$default that is the DB name in settings.py). PA only allows you to create new databases from its web interface, and also only allows you to create databases that are prefixed with your-username$

After a bit of thought, I realised that you can work around the problem by setting TEST_NAME in settings.py to point to a specific new database (say, USERNAME$unittest) and then creating a DB of that name from the MySQL tab. Once you've done that, you run the tests again; you get an error like this:

19:02 ~/somewhere (master)$ ./manage.py test
Creating test database for alias 'default'...
Got an error creating the test database: (1007, "Can't create database 'USERNAME$unittest'; database exists")
Type 'yes' if you would like to try deleting the test database 'USERNAME$unittest', or 'no' to cancel:

You just enter "yes", and it will drop then recreate the database. This works, because when you created it from the MySQL page, the settings were set up correctly for you to be able to create and drop it again in the future. Once this has been done once, tests run just fine in the future, with no DB errors.

Obviously we'll be fixing this behaviour in the future (though I can't offhand see how...). But there's the workaround, anyway.

PythonAnywhereAnywhere

Posted on 27 February 2012 in PythonAnywhere, Python |

We recently added something cool to PythonAnywhere, our Python online IDE and web hosting environment -- if you're writing a tutorial, or anything else where you'd find a Python console useful in a web page, you can use one of ours! Check it out:

What's particularly cool about these consoles (apart from the fact that they advertise the world's best Python IDE-in-a-browser) is that they keep the session data on a per-client basis -- so, if you put one on multiple pages of your tutorial, the user's previous state is kept as they navigate from page to page! The downside (or is it an upside?) is that this state is also kept from site to site, so if they go from your page to someone else's, they'll have the state they had when they were trying out yours.

Bug or feature? Let me know what you think in the comments...

Teaching programming

Posted on 14 October 2011 in PythonAnywhere, Python |

One thing we wanted to do with PythonAnywhere, our Python online IDE and web hosting environment, was put together a short introduction to Python for non-programmers. I wrote the first cut the other day.

I've always been a fan of the Pimsleur language lessons. Unlike very traditional ways of teaching foreign languages, they don't make you learn vocabulary lists and grammatical rules. Unlike more modern systems, they don't try to teach you phrases. There's no written textbook, just a bunch of CDs (or these days MP3s). And they throw you right in at the deep end.

At the start of each Pimsleur course, a soothing voice says something like "Welcome to Pimsleur Klingon part one, lesson one. Listen to the following conversation." And then you hear something that sounds like this. The soothing voice comes back and says "in 30 minutes you'll hear that again, and you'll understand it." And then you're introduced to the different sentences, and told to repeat stuff. "Repeat this: tlhab 'oS 'Iw HoHwI' So' batlh" they'll say, and you'll stumble your way through it. Then they'll get you to repeat tlhab on its own a few times. For complicated words they'll break it down, and get you to repeat the last bit first, gradually building up. I. wI. oHwI. HoHwI. After 30 minutes of this, your brain is aching, and then they play the conversation you heard at the start -- and it makes perfect sense!

Quite incredible.

So what's this all got to do with teaching programming? Well, we were looking for a decent "Python for non-programmers" course that we could use, and while we found a couple, they all seemed to have the same problem -- they'd work by gradually introducing simple concepts, and after twenty pages you might have learned enough to write a trivial program.

That's not how any of us learned programming. Instead, we tended to pick it up bit by bit by inspection of complete working bits of code. For me, it was typing in listings from the home computing magazines that I bought. (Note for younger readers: listings were printed programs in magazines in the days when disks were too expensive to give away with magazines. (Note for even younger readers: disks were primitive USB-stick-like things that people used to use to store data, and were often attached to magazines as a way of passing data from magazine to reader in the days when you couldn't just put a URL in the article text. (Note for yet younger readers: magazines were collections of very thin slices of wood on which text and images were "printed" using chemical dyes, which could be purchased in places called shops. They were popular in the Dark Ages before the Facebook.)))

That way of learning -- which must be even easier now that any aspiring programmer can look at the source of tons of open source software, or just view the source of any web page -- was quick, efficient, and got you straight to a position where you felt you'd achieved something. You'd written your first program! You could change the bits you understood, and leave the rest for a later day when you knew more. A great way to learn, as well as being excellent training for later days when you find yourself maintaining some horrific codebase written by someone who thinks that operator overloading is the best thing since sliced bread.

So, my theory for the first part of this tutorial -- and, if it works out, for any following ones -- is that each should start with a program listing that the reader won't understand yet, but should be able to understand with a little explication. Core concepts should be drilled in by relentless repetition. No baby-talk -- start using the technical terms straight away, and then repeat them enough that if the reader forgets what "variable" means then they can work it out from context. And aim to get them from knowing nothing to simple OO and functional programming in five half-hour lessons.

What do you think?

Busy, busy, busy

Posted on 27 April 2011 in PythonAnywhere, Startups, Python, Resolver One |

A couple of weeks back we were brainstorming about other ways we could make use of the code infrastructure we'd put together for Dirigible. We had loads of stuff for running functional tests, determining dependencies between spreadsheet cells, executing untrusted user code safely on our servers, and so on. Any of those could potentially make an interesting product, so we put together some basic landing pages, one for each idea, and put a bit of money into Google AdWords to see if any of them got any interest.

One of them took off immediately, and even started getting traction on Twitter: PythonAnywhere, an online Python IDE and web application environment -- basically, Dirigible without the spreadsheet grid. This fits in with what we suspected -- lots of people were interested in Dirigible, but it wasn't the spreadsheet side of it that excited them, it was the easy Python grid computing.

What's been particularly cool with this idea is not only that most of it is done and "just" needs breaking out of Dirigible and putting into a new product, but that people are keen to engage with us about it. When people signed up on our landing page, we sent them an email with a few questions -- "What would you use it for? Which features excite you? What would you pay for it? Any suggestions for other features?" About 25% of people have replied, with lots of great feedback, and we've changed our plans (and altered the relative priorities of features) based on their input. All very Lean Startup...

Anyway, all good clean fun. If you'd like a look at it when it goes into beta, you can sign up on the site, or just leave a comment below.

London Financial User Group Meeting: 17 January

Posted on 10 January 2011 in Python, Finance |

The next meeting of the LFPUG will be on 17 January, from 19:00 – 21:00 — location TBD. Two talks are scheduled:

Developing and Deploying Python applications on GPU Cloud Platforms, Suleiman Shehu, CEO of Azinta Systems
Black-box model validation with Python, Patrick Henaff

Both sound interesting, the first in particular! There's still time to propose a lightning talk, too -- I think the best way is to send the organiser, Didrik Pinte, an email. If you're on LinkedIn, there's also more information in the LFPUG group there.

A big announcement from Resolver Systems

Posted on 1 October 2010 in Dirigible, Python, Resolver One |

So, I've let various hints drop over the last few months, but we did the official annoucement today: a new product from Resolver, called Dirigible (thanks to Wikipedia's "Random page" link :-). It's been in private beta for a few weeks, and we decided it was time to get the news out there about it. As to what it is... our tagline is that it is "a spreadsheet-like tool for Python grid computing". That's kind of fuzzy (and probably needs a bit of work), but what I do want to make clear is what it's not: it is not just a web-based version of Resolver One, our desktop Python spreadsheet.

Instead, it's something much more developer-focused, built from the ground up -- sharing code with Resolver One, of course, but not trying to duplicate it. To quote the official annoucement:

We took the things from Resolver One that made software developers say "wow" -- like Python-based formulae, objects in the grid, and the ability to treat a spreadsheet as a function and call it from another sheet. Then we worked out what we could make better by coding just those things as a web application backed by traditional Python -- not IronPython -- on a grid of Linux servers.

You can read more about Dirigible and how it relates to Resolver One on the company blog, or there's a more concise version on the product's own web page. If you'd like to try it out, there's a signup form on the main Dirigible page; we're keeping beta user numbers small for now, but building up as we gain confidence that we've not done anything totally stupid with regard to security or scalabity...

I think everyone at Resolver's done a great job in putting it all together -- of course, being able to share code with Resolver One helped a lot :-) And I'm sure that Dirigible's going to be a great addition to the company's product line.