» Create short IDs with PHP - Like Youtube or TinyURL
IDs are often numbers. Unfortunately there are only 10 digits to work with, so if you have a lot of records, IDs tend to get very lengthy. For computers that's OK. But human beings like their IDs as short as possible. So how can we make IDs shorter? Well, we could borrow characters from the alphabet as have them pose as additional numbers.... Alphabet to the rescue!
Other title options where
- How to create unique short string IDs with PHP & MySQL
- Or how to create IDs similar to YouTube e.g. yzNjIBEdyww
I created this function a long time ago. Time to be nice and share.
More is Less - the 'math'
The alphabet has 26 characters. That's a lot more than 10 digits. If we also distinguish upper- and lowercase, and add digits to the bunch or the heck of it, we already have (26 x 2 + 10) 62 options we can use per position in the ID.
Now of course we can also add additional funny characters to 'the bunch' like - / * & # but those may cause problems in URLs and that's our target audience for now.
OK so because there are roughly 6x more characters we will use per position, IDs will get much shorter. We can just fit a lot more data in each position.
This is basically what url shortening services do like tinyurl, is.gd, or bit.ly. But similar IDs can also be found at youtube: http://www.youtube.com/watch?v=yzNjIBEdyww
Convert your IDs
Now unlike Database servers: webservers are easy to scale so you can let them do a bit of converting to ease the life of your users, while keeping your database fast with numbers (MySQL really likes them plain numbers ; ).
To do the conversion I've written a PHP function that can translate big numbers to short strings and vice versa. I call it: alphaID.
The resulting string is not hard to decipher, but it can be a very nice feature to make URLs or directorie structures more compact and significant.
So basically:
- when someone requests rLHWfKd
- alphaID() converts it to 999999999999
- you lookup the record for id 999999999999 in your database
Source
<?php /** * Translates a number to a short alhanumeric version * * Translated any number up to 9007199254740992 * to a shorter version in letters e.g.: * 9007199254740989 --> PpQXn7COf * * specifiying the second argument true, it will * translate back e.g.: * PpQXn7COf --> 9007199254740989 * * this function is based on any2dec && dec2any by * fragmer[at]mail[dot]ru * see: http://nl3.php.net/manual/en/function.base-convert.php#52450 * * If you want the alphaID to be at least 3 letter long, use the * $pad_up = 3 argument * * In most cases this is better than totally random ID generators * because this can easily avoid duplicate ID's. * For example if you correlate the alpha ID to an auto incrementing ID * in your database, you're done. * * The reverse is done because it makes it slightly more cryptic, * but it also makes it easier to spread lots of IDs in different * directories on your filesystem. Example: * $part1 = substr($alpha_id,0,1); * $part2 = substr($alpha_id,1,1); * $part3 = substr($alpha_id,2,strlen($alpha_id)); * $destindir = "/".$part1."/".$part2."/".$part3; * // by reversing, directories are more evenly spread out. The * // first 26 directories already occupy 26 main levels * * more info on limitation: * - http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/165372 * * if you really need this for bigger numbers you probably have to look * at things like: http://theserverpages.com/php/manual/en/ref.bc.php * or: http://theserverpages.com/php/manual/en/ref.gmp.php * but I haven't really dugg into this. If you have more info on those * matters feel free to leave a comment. * * @author Kevin van Zonneveld <kevin@vanzonneveld.net> * @author Simon Franz * @author Deadfish * @copyright 2008 Kevin van Zonneveld (http://kevin.vanzonneveld.net) * @license http://www.opensource.org/licenses/bsd-license.php New BSD Licence * @version SVN: Release: $Id: alphaID.inc.php 344 2009-06-10 17:43:59Z kevin $ * @link http://kevin.vanzonneveld.net/ * * @param mixed $in String or long input to translate * @param boolean $to_num Reverses translation when true * @param mixed $pad_up Number or boolean padds the result up to a specified length * @param string $passKey Supplying a password makes it harder to calculate the original ID * * @return mixed string or long */ function alphaID($in, $to_num = false, $pad_up = false, $passKey = null) { $index = "abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"; if ($passKey !== null) { // Although this function's purpose is to just make the // ID short - and not so much secure, // with this patch by Simon Franz (http://blog.snaky.org/) // you can optionally supply a password to make it harder // to calculate the corresponding numeric ID for ($n = 0; $n<strlen($index); $n++) { $i[] = substr( $index,$n ,1); } $passhash = hash('sha256',$passKey); $passhash = (strlen($passhash) < strlen($index)) ? hash('sha512',$passKey) : $passhash; for ($n=0; $n < strlen($index); $n++) { $p[] = substr($passhash, $n ,1); } array_multisort($p, SORT_DESC, $i); $index = implode($i); } $base = strlen($index); if ($to_num) { // Digital number <<-- alphabet letter code $in = strrev($in); $out = 0; $len = strlen($in) - 1; for ($t = 0; $t <= $len; $t++) { $bcpow = bcpow($base, $len - $t); $out = $out + strpos($index, substr($in, $t, 1)) * $bcpow; } if (is_numeric($pad_up)) { $pad_up--; if ($pad_up > 0) { $out -= pow($base, $pad_up); } } $out = sprintf('%F', $out); $out = substr($out, 0, strpos($out, '.')); } else { // Digital number -->> alphabet letter code if (is_numeric($pad_up)) { $pad_up--; if ($pad_up > 0) { $in += pow($base, $pad_up); } } $out = ""; for ($t = floor(log($in, $base)); $t >= 0; $t--) { $bcp = bcpow($base, $t); $a = floor($in / $bcp) % $base; $out = $out . substr($index, $a, 1); $in = $in - ($a * $bcp); } $out = strrev($out); // reverse } return $out; } ?>Get from GitHub
Example
Running:
alphaID(9007199254740989);
will return 'PpQXn7COf' and:
alphaID('PpQXn7COf', true);
will return '9007199254740989'
Easy right?
More features
- There also is an optional third argument:
$pad_up. This enables you to make the resulting alphaId at least X characters long. - You can support even more characters (making the resulting alphaID
even smaller) by adding characters to the
$indexvar at the top of the function body.
Bonus
Thanks to some wonderful contributions in the comment section, here are some interesting updates & additions:
Pro tip
You may want to remove vouwels (a, e, o, u, i) from $index
as to avoid combinations that result in: 'penis' or other dirty words
that could get your customers upset.
You can also use the $pad_up argument to enforce a minimum length
of 5 characters as to avoid: 'nsfw' and 'wtf'.
Thanks to William for pointing this out ;)
Postgres Implementation
Thanks to William as well:
CREATE OR REPLACE FUNCTION string_to_bits(input_text TEXT) RETURNS TEXT AS $$ DECLARE output_text TEXT; i INTEGER; BEGIN output_text := ''; FOR i IN 1..char_length(input_text) LOOP output_text := output_text || ascii(substring(input_text FROM i FOR 1))::bit(8); END LOOP; RETURN output_text; END; $$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION id_to_sid(id INTEGER) RETURNS TEXT AS $$ DECLARE output_text TEXT; i INTEGER; INDEX TEXT[]; bits TEXT; bit_array TEXT[]; input_text TEXT; BEGIN input_text := id::TEXT; output_text := ''; INDEX := string_to_array('0,d,A,3,E,z,W,m,D,S,Q,l,K,s,P,b,N,c,f,j,5,I,t,C,i,y,o,G,2,r,x,h,V,J,k,-,T,w,H,L,9,e,u,X,p,U,a,O,v,4,R,B,q,M,n,g,1,F,6,Y,_,8,7,Z', ','); bits := string_to_bits(input_text); IF length(bits) % 6 <> 0 THEN bits := rpad(bits, length(bits) + 6 - (length(bits) % 6), '0'); END IF; FOR i IN 1..((length(bits) / 6)) LOOP IF i = 1 THEN bit_array[i] := substring(bits FROM 1 FOR 6); ELSE bit_array[i] := substring(bits FROM 1 + (i - 1) * 6 FOR 6); END IF; output_text := output_text || INDEX[bit_array[i]::bit(6)::integer + 1]; END LOOP; RETURN output_text; END; $$ LANGUAGE plpgsql;
Java Implementation
Thanks to Ant Kutschera there also is a Java version. Click on his name for the external link to it.
JavaScript Implementation
Thanks to Even Simon, there's a JavaScript implementation. You will also find PHP version there, that implements the encode & decode functions as separate methods in a class.
/** * Javascript AlphabeticID class * (based on a script by Kevin van Zonneveld <kevin@vanzonneveld.net>) * * Author: Even Simon <even.simon@gmail.com> * * Description: Translates a numeric identifier into a short string and backwords. * * Usage: * var str = AlphabeticID.encode(9007199254740989); // str = 'fE2XnNGpF' * var id = AlphabeticID.decode('fE2XnNGpF'); // id = 9007199254740989; **/ var AlphabeticID = { index:'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ', /** * <a href="http://twitter.com/function">@function</a> AlphabeticID.encode * <a href="http://twitter.com/description">@description</a> Encode a number into short string * <a href="http://twitter.com/param">@param</a> integer * <a href="http://twitter.com/return">@return</a> string **/ encode:function(_number){ if('undefined' == typeof _number){ return null; } else if('number' != typeof(_number)){ throw new Error('Wrong parameter type'); } var ret = ''; for(var i=Math.floor(Math.log(parseInt(_number))/Math.log(AlphabeticID.index.length));i>=0;i--){ ret = ret + AlphabeticID.index.substr((Math.floor(parseInt(_number) / AlphabeticID.bcpow(AlphabeticID.index.length, i)) % AlphabeticID.index.length),1); } return ret.reverse(); }, /** * <a href="http://twitter.com/function">@function</a> AlphabeticID.decode * <a href="http://twitter.com/description">@description</a> Decode a short string and return number * <a href="http://twitter.com/param">@param</a> string * <a href="http://twitter.com/return">@return</a> integer **/ decode:function(_string){ if('undefined' == typeof _string){ return null; } else if('string' != typeof _string){ throw new Error('Wrong parameter type'); } var str = _string.reverse(); var ret = 0; for(var i=0;i<=(str.length - 1);i++){ ret = ret + AlphabeticID.index.indexOf(str.substr(i,1)) * (AlphabeticID.bcpow(AlphabeticID.index.length, (str.length - 1) - i)); } return ret; }, /** * <a href="http://twitter.com/function">@function</a> AlphabeticID.bcpow * <a href="http://twitter.com/description">@description</a> Raise _a to the power _b * <a href="http://twitter.com/param">@param</a> float _a * <a href="http://twitter.com/param">@param</a> integer _b * <a href="http://twitter.com/return">@return</a> string **/ bcpow:function(_a, _b){ return Math.floor(Math.pow(parseFloat(_a), parseInt(_b))); } }; /** * <a href="http://twitter.com/function">@function</a> String.reverse * <a href="http://twitter.com/description">@description</a> Reverse a string * <a href="http://twitter.com/return">@return</a> string **/ String.prototype.reverse = function(){ return this.split('').reverse().join(''); };
Python Implementation
Thanks to wessite, there's a Python implementation.
ALPHABET = "bcdfghjklmnpqrstvwxyz0123456789BCDFGHJKLMNPQRSTVWXYZ" BASE = len(ALPHABET) MAXLEN = 6 def encode_id(self, n): pad = self.MAXLEN - 1 n = int(n + pow(self.BASE, pad)) s = [] t = int(math.log(n, self.BASE)) while True: bcp = int(pow(self.BASE, t)) a = int(n / bcp) % self.BASE s.append(self.ALPHABET[a:a+1]) n = n - (a * bcp) t -= 1 if t < 0: break return "".join(reversed(s)) def decode_id(self, n): n = "".join(reversed(n)) s = 0 l = len(n) - 1 t = 0 while True: bcpow = int(pow(self.BASE, l - t)) s = s + self.ALPHABET.index(n[t:t+1]) * bcpow t += 1 if t > l: break pad = self.MAXLEN - 1 s = int(s - pow(self.BASE, pad)) return int(s)
Python Implementation
Thanks to Andy Li, there's a HaXe implementation.
/** * HaXe version of AlphabeticID * Author: Andy Li <andy@onthewings.net> * ported from... * * Javascript AlphabeticID class * Author: Even Simon <even.simon@gmail.com> * which is based on a script by Kevin van Zonneveld <kevin@vanzonneveld.net>) * * Description: Translates a numeric identifier into a short string and backwords. * http://kevin.vanzonneveld.net/techblog/article/create_short_ids_with_php_like_youtube_or_tinyurl/ **/ class AlphaID { static public var index:String = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'; static public function encode(_number:Int):String { var strBuf = new StringBuf(); var i = 0; var end = Math.floor(Math.log(_number)/Math.log(index.length)); while(i <= end) { strBuf.add(index.charAt((Math.floor(_number / bcpow(index.length, i++)) % index.length))); } return strBuf.toString(); } static public function decode(_string:String):Int { var str = reverseString(_string); var ret = 0; var i = 0; var end = str.length - 1; while(i <= end) { ret += Std.int(index.indexOf(str.charAt(i)) * (bcpow(index.length, end-i))); ++i; } return ret; } inline static private function bcpow(_a:Float, _b:Float):Float { return Math.floor(Math.pow(_a, _b)); } inline static private function reverseString(inStr:String):String { var ary = inStr.split(""); ary.reverse(); return ary.join(""); } }
Stay up to date
You can track my blog
articles and
comments. You may also find my
bookmarks interesting. Or
Follow me on Twitter
Like this Article?
|
Your money is no good here, but you can boost morale by spreading the word! : ) |
RelatedArticles like this one» Improve MySQL Insert Performance |
tags: php, programming, mysql, database, youtube, tinyurl
category: Programming
read: 19,291 times






tagcloud
#36. Kevin on 10 June 2010
@ Tim: You're note supposed to do anything. But I would just use numeric indexes to keep the database speedy, and use appservers (easiest to scale in any infra) to care care of the conversion. If you don't want small alphaIds like 'c', try the pad argument.
#35. Tim on 23 May 2010
I'm building a new site and don't have big numbers yet. So if I set my ID column to make up a big number, such as INT(16) unsigned zerofill and INSERT a record, I get 0000000000000001 as an id. If I return that using the alphaID() function I get 'c'. Hardly usable. :)
Are we supposed to add a second id column, alpha_id INT(16), instead and have PHP generate a random number, then INSERT IGNORE and hope that the number already isn't in the db? Or are we supposed to store the alphaID "PpQXn7COf" in the db and lookup on that? If so, doesn't that then negate the speed of a numerical lookup? Also, that just goes back to having PHP generate a rand number, right?
#34. Stephen on 22 May 2010
I thought I might add to it. If you have a short number and the pad_up say alphaID('9', false, 7, 'passkeys') you get a number like 366666P.
to avoid having something uniform like this I multiply the
... [more] (Digital number -->> alphabet letter code)
by a large prime number.. eg: $in = $in * 89527;
and you need to do the rev also so at the end of
(Digital number <<-- alphabet letter code):
$out = $out / 89527;
This will produce a more cryptic output like YDPH66P.
As for the prime number I just went and picked one from http://www.mathsisfun.com/numbers/prime-number-lists.html
#33. Kevin on 15 May 2010
@ esco: Yeah you need the bcmath stuff.
@ Webdesigner: while it was not intended to encrypt or hide the actual id so much as to make it smaller/easier to read, Simon Franz patched the function to support passwords. the 4th argument.
#32. esco on 07 May 2010
#31. Webdesigner on 02 May 2010
#30. esco on 01 May 2010
When I try to debug a bit amateuristic, I think it breaks at:
Can someone help?
#29. SD (Aspherical) on 30 April 2010
Right now, I just put the encoded link in a database table, but I might explore using the decoder in the future.
#28. Kevin on 24 April 2010
#27. Thiet ke website on 19 April 2010
#26. Andy Li on 06 April 2010
Here it is:
http://gist.github.com/358018
#25. Kevin on 04 April 2010
#24. wessite on 31 March 2010
I needed this in python to use on Google Appengine, here's the code:
#23. Kevin on 27 March 2010
#22. William on 27 March 2010
In my last comment I suggested to remove all vowels from $index to prevent unfriendly / dirty words. With that in mind I would like to add another suggestion: to make the string at least 5 characters long to prevent getting id's like: 'wtf', 'nsb', 'nsfw', etc ;-)
I also made a PostgreSQL version of Kevin's script :-)
... [more]
It doesn't create a random unique string, but it will convert numbers to a string. It's an edited version of base64 encoding (URL save version without characters like: '/' and '=' ).
I hope you people find any use for it.
<?php
CREATE OR REPLACE FUNCTION string_to_bits(input_text TEXT)
RETURNS TEXT AS $$
DECLARE
output_text TEXT;
i INTEGER;
BEGIN
output_text := '';
FOR i IN 1..char_length(input_text) LOOP
output_text := output_text || ascii(substring(input_text FROM i FOR 1))::bit(8);
END LOOP;
return output_text;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION id_to_sid(id INTEGER)
RETURNS TEXT AS $$
DECLARE
output_text TEXT;
i INTEGER;
index TEXT[];
bits TEXT;
bit_array TEXT[];
input_text TEXT;
BEGIN
input_text := id::TEXT;
output_text := '';
index := string_to_array('0,d,A,3,E,z,W,m,D,S,Q,l,K,s,P,b,N,c,f,j,5,I,t,C,i,y,o,G,2,r,x,h,V,J,k,-,T,w,H,L,9,e,u,X,p,U,a,O,v,4,R,B,q,M,n,g,1,F,6,Y,_,8,7,Z', ',');
bits := string_to_bits(input_text);
IF length(bits) % 6 <> 0 THEN
bits := rpad(bits, length(bits) + 6 - (length(bits) % 6), '0');
END IF;
FOR i IN 1..((length(bits) / 6)) LOOP
IF i = 1 THEN
bit_array[i] := substring(bits FROM 1 FOR 6);
ELSE
bit_array[i] := substring(bits FROM 1 + (i - 1) * 6 FOR 6);
END IF;
output_text := output_text || index[bit_array[i]::bit(6)::integer + 1];
END LOOP;
return output_text;
END;
$$ LANGUAGE plpgsql;
?>
Have a nice day!
William
#21. Kevin on 24 March 2010
@ Even Simon: Good stuff man!
@ both: I'll update the article with clear references to your comments soon
#20. Even Simon on 23 March 2010
I've used your code for my website, let me say this is some excellent work. Anyhow I also needed the same functionality on the client-side (JavaScript) so I had to write my own. Here it is:
Plus I've rewritten your PHP function into a PHP class to make it more suitable for my code:
Have a nice day. $))
-Simon
#19. William on 22 March 2010
I got 1 small advise.. Remove all vouwels (a, e, o, u, i) from $index, otherwise one day one of your customers will ask you why his username (or whatever) is 'penis' (or another unfriendly/dirty word) ;-).
#18. Kevin on 21 February 2010
#17. Ant Kutschera on 13 February 2010
ive done the same thing independently using java.
http://blog.maxant.co.uk/pebble/2010/02/02/1265138340000.html
... [more]
im not sure what this type of encoding is really called...
another application for it is where you want to provide users with a pin which they can share among friends. but you dont want anyone to guess the pin. so you take your primary key for the relevant thing which is being shared, and append a 4 digit random number the the end, before encoding your big number. the PIN you distribute is the encoded shorter version.
#16. Kevin on 07 January 2010
#15. Catalin on 17 December 2009
#14. Kevin on 13 December 2009
http://github.com/kvz/kvzlib/commit/1bf020eb82fcfac67353219817b3813e2df325e5
#13. Deadfish on 08 December 2009
replace log10($in) / log10($base) with log10($in, $base) and $a = floor($in / $bcp) with $a = floor($in / $bcp) % $base. That will fix the bug with alphaID(238328);
#12. Deadfish on 07 December 2009
#11. Deadfish on 07 December 2009
#10. Kevin on 25 October 2009
http://github.com/kvz/kvzlib/commit/323e9c3bb3e489150bdddea51a785e1e931003d7
#9. Tanzmusik on 11 October 2009
The only improvement i mean is to modify the code by adding or removing some letters before use. If you do not modify, then everyone else can reveal your primary key structure.
#8. Kevin on 09 October 2009
#7. BnoL on 17 September 2009
But I think every encode script need to have a "password key" so that noone else can decode your ID :) (unless he/she knows your password key).
#6. Topbit on 30 July 2009
There is also a number of other functions there that will do larger number bases - such as 62, using similar techniques as the above post.
#5. Marcelo on 11 July 2009
#4. Kevin on 18 June 2009
#3. Gerrit on 18 June 2009
in your converter.
Follow my link for my version of the converter.
#2. devnic on 12 June 2009
#1. Đỗ Nam Khánh on 11 June 2009