ashley madison's boobs

Ashley Madison for DBAs

Our database guru couldn’t keep her hands off the leaked Ashley Madison database. So we gave her a copy and set her loose on it. She reports back, “I’ve never had so much fun with a database. Thank you, thank you.

Read on to find out what she found and how she found it.

The Database Files

We cover the technical aspects of the unfolding hacking incident on our parent web-site in a series of Dumping Ashley articles. The series covers far more than just the database, so we’ll summarize the main points of relevence to DBAs here.

Our DBA was delighted to find that the database files are in the form of mysql dumps. One for each dynamic table. Her, and our, favourite RDBMS. When we first grabbed the torrent from the Dark Net we feared it might be in MSSQL format, as hacked databases so often are.

We downloaded it anyway and fortune favouring the bold, we got lucky:

ashley madison dump file list

These file-types are typically used when making back-ups of a database and it is very quick to reload them into a new database. Loading from within the mysql client on an Ubuntu platform was a breeze. We have a twin CPU database host and the dump files are well-formed.

Our DBA was quickly writing queries

She had to make a couple of minor edits with a hex-editor, but the tables were soon loaded. Weighing-in at 37,000,000 rows in each table makes this a borderline small to medium database.

The most interesting table is am_am_member our DBA says because, “that’s the one they’ve got all their fetishes listed in“. Full-text querying this table is fairly painful though, with the important data spread across many fields.

So she decided to drop the database and make a few more adjustments with the hex-editor. She added some extra indexes and after reloading created a new table that denormalised all the user-generated text fields, catenating them into one to allow for less complex queries when searching.

catenate_text

This sub-table she linked to the main table through the pnum primary key, so the original data can be consulted if needed, and with speed if the pnums are known. Full table scans of user-generated text can still take minutes though.

cops on ashley madison

There is another member details table that contains duplicate infomation, and it’s keywords field may have already done the text catenation. Once we’ve matched up the id and pums from the two tables – if this can be reliably done without the referential tables – we will be able to combine both.

She is now able to supply accurate on-demand data to support articles for our parent site. That’s what’s important.

Links
Dumping Ashley
Ashley Madison In Numbers
Charts and Analytics
User Profile Captions
Latest Updates

Refactoring in Progress

The duplications across tables, and the field names within them, suggest that Avid Life Media, owners of the Ashley Madison brand, were in the process of refactoring when the snapshot was taken. As their user base was growing they were likely having performance issues with their schema and redesigning it for anticipated future volumes.

The inevitable result is that the DB is a mess.

The leaked dumps only contain dynamic tables. The static data tables are missing. It is not possible to fully reconstruct any user’s profile without these reference tables. These tables could probably be recreated from the source-code, but for our purposes of analyzing user-generated text there is really no need for them.

Some information about the user we can glean. For example we know that a gender look-up table must exist because we see user profiles indicating genders of 1 or 2. We also see 0, and of course the NULL gender.

ashley madison genders

We do not know which is male and which female but we can infer which is which by looking at the text and keywords users have entered. Phrases such as “I am a girl“, giving us a good indicator as to what the numbers mean.

We found two users with a gender of zero, which we presumed to be test accounts and ignored.

Other references cannot be inferred though. More complex and unnormalised references exist, but there is no way we can reconstruct these. Short of asking The Impact Team to go back in and get them for us, which we don’t think they are likely to do.

We can live without those reference tables though. In time we will reconstruct many foriegn-key relationships and refine our indexes.

Opening Pandora’s Box

We don’t want to be just another profile perving web-site. We prefer to leave the individual in peace, and look at the bigger picture. What does their behaviour, now exposed to the world, tell us, if anything, about our own?

We have got enough data to identify the males and females. Thanks to our DBA’s concatenated user profile text-fields, we can now easily mine it for their fetishes, depravities, perversions, and other pecadillos.

The Ashley Madison database, as most are by now aware, is something of a Pandora’s Box. Once opened, you don’t know what will come out of it next.

For example:

drug use on ashley madison

Credit-Card Data

Building a key-relationship with the credit-card data is non-trivial. This data was not leaked as a database, rather a collection of thousands of excel spreadsheets.

They can be loaded into a database but there’s a problem. They are not all in the same format. They have different numbers of fields. It would take some time to analyze each file and transform them into a common format, so we left them out as there’s nothing to be gained.

Others are better at searching through spread-sheets for the names of B-List celebs, shaming for the purpose of, and we leave that to them.

We don’t name names, so we’ve no real interest in the credit card files. It would be easier to search them with grep anyhow. There’s no need to put these into our DB so our we left them out.

Which is presumably why Avid Life Media, owners of Ashley Madison, also left them out.

Pity they didn’t take the time to password-protect those credit-card files. Their company might even still have a future if they had. Then again, the contents of the CEO’s leaked email archive does seem to imply that the company was being run by assholes and will do the company far more harm than the loss of their users data.

Database Distribution

There are already a large number of copies of this DB in the field, supporting a broad range of applications from front-end portals, email-checkers, to twitter bots.

If you’re a DBA or a webmaster who wants to share technical tips on the database, please get in touch.

If you’re looking to find out whose name is on the list though, you’ll have to look elsewhere.

We’re not saying.

ashley madison porn logo