Alexa is worth nothing

可能有不少公司都很看重Alexa排名。把Alexa作为分析业务数据的一个重要工具。包括我们自己,看着市场部的同仁们满头大汗的对比各种排名,通过Traffic Rank折算出网站的实际访问人数。不由得产生了疑问:这样做值得吗?Alexa上的排名真的可以当作决策依据吗?它真的有意义吗?

来看一下alexa的数据是怎么得出的:

 "Alexa’s traffic rankings are based on the usage patterns of Alexa Toolbar users over a rolling 3 month period. A site’s ranking is based on a combined measure of reach and pageviews. Reach is determined by the number of unique Alexa users who visit a site on a given day."

Alexa只能记录那些在浏览器里装了Alexa工具条的访客的访问活动,没有装该工具条的访客都不会被记录在内。而且这种方式也非常容易作弊。比如我只要找几个朋友,和我一起装上Alexa的工具条,要刷到排名100,000以内是很容易的事情。只要有耐心,找足够多的人,刷到10,000以内也不是难事。(不过我的avast提示这个工具条软件有木马,想试验的朋友自己小心了:D)
有些广告系统,比如Text Link Ads,使用Alexa排名来和广告位的价格挂钩,排名越高,相应的价格也越高。要骗这个钱也不是难事:如果不介意作弊的话,需要的只是耐心和时间。 98年的时候,Alexa曾请求FirexFox将它的工具条内嵌到FireFox里,直到今天仍没有完全放弃努力。我一点也不奇怪为什么FireFox没有接受这个请求,如果他们接受了那才真的是一件比较奇怪的事情。 为了推行Alexa的工具条,Alexa或多或少的扮演了一些不光彩的角色,在不少流氓软件里都能看到它的身影。随着浏览器安全的提高和用户安全意识、自我保护意识的不断提升,Alexa工具条的客户端份额一定会逐渐萎缩,排名数据也将越来越没有代表性。除非他有更好的办法来解决这些问题。

如果要分析真实的访客数据,那么用类似Google Analytics或者LogMicroscope这样的工具才是正解。永远不要相信Alexa的数据。

facebook

facebook开放注册了,试用了一下,内建的相册不太喜欢,如果能和外部相册比如flickr结合就好了。

"Your Facebook Badges"是一个我比较喜欢的功能:

Dean Lee's Facebook profile

可以自己定义要显示的内容,作为自己的个性化连接,放到其他任何站点上。

有报道说FaceBook准备在用户的RSS feeds中加入广告条目,当订阅该RSS的用户点击到这条广告时,该用户的所有朋友都会收到这个信息,并可以选择也加入这个广告发布商,在RSS feeds中发布他的广告。

When one user clicks on an advertisement in their feeds, all of that user’s friends will be notified that the ad was clicked on and will be given an opportunity to join a group led by the advertiser, apparently. Mike Murphy, Facebook’s chief revenue officer, told MediaWeek the following: “Up until now, most advertising on social network sites hasn’t leveraged social networking behavior…This offers a viral opportunity that is unique for advertisers that is not disruptive.”

听起来是个不错的主意,但是有些冒险,也未必能赚到钱。大部分人都希望自己能有一个“干净”的站点,不喜欢在自己的站点上有广告。在自己的RSS feeds里添加广告给个人带来的微不足道的收益并没有多少诱惑力。而且很可能会招来订阅者的反感。

不管怎样,这是对传统广告(比如text ads,banner ads等)的一个很好的补充,同时也让广告具有了社会化网络的一些特征,单从这一点来说,这是一个进步。

AJAX-the answer to webmail

WebMail也许才是能够真正显示AJAX威力的最重要的应用。大部分人每天都会花不少时间用于处理mail,我自己每天至少要花3到4个小时来阅读和处理我的邮件。当我们花那么多的时间做同一件事情的时候,任何很小的改进都能很大程度的提升工作效率。

AJAX让我们有更多的手段来对webmail完成这样那样的改进,目前的webmail也都有向AJAX靠拢的趋势,gmail,windows live mail都在这方面做了很多有价值的尝试,但是仍然有很多问题。

Continue reading “AJAX-the answer to webmail”

ZIKI-new social network

像其他社会化网络一样,ZIKI可以让你轻松的和网络上的其他人建立连接,基于相互的爱好,和各自目前所感兴趣的东西。

光这样似乎还体现不出ZIKI和其他社会化网络的区别。比如国内的wealink

虽然ZIKI自身只提供社会化网络的服务,但是ZIKI同时允许你将自己放在不同地方的数字化生活都搜集到一起,并在一个集中的地方展示,和朋友分享。比如blog、del.ici.ous、flickr、links、videos或者其他。而这正是wealink之类的社会化网络所缺乏的(而且我很奇怪为什么wealink对于用户上传头像的大小限制会如此的吝啬,52*52的头像只能勉强分辨出人与野兽的差别),如果登陆的目的只是为了查找联系人、建立连接的话,那么对于我来说,一个礼拜登陆一次的频率已经足够多了。而ZIKI不一样,不仅仅是连接,而且基于tag的连接搜索方式,在某些方面也更灵活、自由。

更难能可贵的是,在提供这整套完整的,能让你建立起自己的社会化web的工具的同时,ZIKI的界面也设计得非常简单,页面之间的导航相当的容易、一目了然。没有什么乱七八糟的东西。单从这个角度看,ZIKI就是一个相当好的诠释social network的样本。

十年

公司十年了,回顾个人的这十年,想起很多人,很多事。 想起了黄伯林,10年前在上海第一次见到他,他提着一个很大的皮箱,远远的看到钟诚和我就开始傻笑,他总是笑,从没见过他生气,临走一刻也很安详。静静的躺在那里,好像睡着了。我一直不相信他就这样走了,直到棺盖合上的那一瞬间,我才明白,真的就此天人永别了,如今已是千里孤坟,无处话凄凉。

想起了钟诚、耳朵、胡茂,还有那群可爱的开发部的MM。想起了在上海做红烧肉的日子,在文具店闲逛的日子;拿着一口大锅去买混沌的日子;抄着家伙找混混打架的日子;想起了在东方山庄的日子;我们一起泡吧的日子;一起永和豆浆的日子;半夜去把耳朵赎回来的日子;一起在公司打地铺,几个月不回家的日子;以及所有那些我们曾经一起年轻过的日子。

一起宇宙旅行

装了一个开源的三维空间模拟程序:Celestia,做了一次太阳系的旅行。挺有意思的东西,至少小小的满足了一下儿时没有实现的某些梦想。操作方式类似Google Earth。也可以不做任何操作,选择菜单help->run demo来观看演示。 不满足于基础程序的话,也可以去Motherlode下载更多的3D模型和纹理。

SEO Tips

SEO Egghead 整理收集了21条关于SEO的技巧,其中比较有趣的几条摘录如下:

* Hiding text using similar colors and background colors can actually be worse than using the same colors.
* If you sell links, Matt says you should use link condoms [rel=”nofollow”]. Otherwise your reputation may fall. I assume this means they will devalue your outbound links.
* Assign unique, descriptive title tag and headings to every page.
* Use user-friendly URLs like “african-elephants.html,” and not “343432ffsdfsdfdfasffgddddd.html.” Don’t overdo it either — african-elephants-and-their-habitats-etc-etc-etc-etc.html.
* Minimize the number of URL parameters — 1-2 parameters if possible.

其实仔细想想,所有这些针对搜索引擎优化的技巧,都同时能够达到改善用户体验的目的。比如,隐藏的文字用户是看不到的,所以对搜索引擎来说,这些文字没有任何意义,从而也不会出现在搜索引擎查找的结果中。标题和url如果描述性强,本身就能够说明页面的内容。而重定向则会减慢页面装载的时间,让用户陷入痛苦的等待中。

所以在不知道该如何为搜索引擎优化页面的时候,尝试站在你的站点的访客的位置来考虑问题,可能会是更好的选择。

Convert Unicode To UTF8

char* __stdcall UnicodeToUtf8( const WCHAR* wstr )
{
    const WCHAR* w;
    // Convert unicode to utf8
    int len = 0;
    for ( w = wstr; *w; w++ ) {

        if ( *w < 0x0080 ) len++;
        else if ( *w < 0x0800 ) len += 2;
        else len += 3;
    }

    unsigned char* szOut = ( unsigned char* )malloc( len+1 );

    if ( szOut == NULL )
        return NULL;

    int i = 0;
    for ( w = wstr; *w; w++ ) {
        if ( *w < 0x0080 )
            szOut[i++] = ( unsigned char ) *w;
        else if ( *w < 0x0800 ) {
            szOut[i++] = 0xc0 | (( *w ) >> 6 );
            szOut[i++] = 0x80 | (( *w ) & 0x3f );
        }
        else {
            szOut[i++] = 0xe0 | (( *w ) >> 12 );
            szOut[i++] = 0x80 | (( ( *w ) >> 6 ) & 0x3f );
            szOut[i++] = 0x80 | (( *w ) & 0x3f );
        }    }

    szOut[ i ] = '\0';
    return ( char* )szOut;
}

Riya – Visual Search the web

刚使用的时候我以为Riya只是一个普通的图片搜索引擎,几分钟以后才发现远不止这么简单。

 Riya使用了图像识别技术,可以"认出"每张照片里面的人脸或者照片所包含的内容,并自动分门别类,比如花、草,虫,鱼等。 利用Riya,可以找到自己在web上的每一张照片, 只要使用riya提供的上传软件把自己的照片传上去,riya会分析照片里的人物特征,并将互联网上符合该特征的照片都搜索出来。

 听起来很不可思议,但是他们确实做到了。

搜索引擎,何去何从

搜索引擎改变了我们思考问题和寻找答案的方式:"在向别人提问题之前,我先问问搜索引擎"。至少我自己已经习惯了这种思维方式。

 可是目前的搜索引擎技术只能回答一些简单的问题,你也可能需要从搜索出的海量信息中逐条过滤出真正对自己有用的信息。技术永远代替不了人,电脑也永远回答不了过于复杂的问题。

 如何将两者结合起来?answer.gif yahoo answer也许是一个可能的答案。在问电脑之前,为什么不让互联网上的其他人来回答你的问题呢?yahoo answer把寻找答案的方式变为:“在问搜索引擎之前,我先问问互联网上的其他人”。这样做不但能节省时间,也能更详细、准确的得到自己想要的答案。 google也在做这方面的尝试,所不同的是,google是雇佣了一批专家来回答问题,而yahoo则是更开放的结构,任何人都能参与进来回答别人的问题。无论谁最终胜出,获益的都是需要在互联网上寻求问题答案的人。

 但是这种方式真的是终极解决方案吗?我不这么认为。人总是不可靠的,在网络上的人比身边的人更多了一层不确定因素,所以答案也不一定是可靠的。一个错误的答案让你走的弯路,可能要比自己去搜索引擎找答案花的时间多无数倍。 也许这只是在现有技术下的一个可供选择的,“看上去更好“的解决方案。一个能够深度挖掘、组织互联网信息,聪明的回答复杂的问题,并能够记住你的兴趣爱好和搜索风格,不断学习你的搜索习惯和目标的搜索引擎,也许才是更好的解决方案。

 我们让电脑更聪明,剩下的,交给电脑去做。也许,这就是这个世界的终极解决方案。

Netvibes – Personalized HomePage

Netvibes是类似Igogle和MyYahoo的个性化home page。可能是目前同类产品中最好的。

虽然online home page并不是什么新玩意,但是这个法国人创办的小公司,却在短时间内超越了google和yahoo这些前辈。Netvibes的界面非常干净,几乎可以任意定制你所需要的内容(还可以添加tab)。所订阅的Feeds和News都是实时更新,不需要一次次手工刷新页面。在设置了pop邮箱之后,显示我的中文邮件的标题也没有问题,还可以用快捷键管理。

只要有创意,小公司永远都有机会。

wp_url_rewriting:URL Rewriting for WordPress under IIS (V 2.1)

This ISAPI filter removes the index.php from WordPress permalinks on Windows IIS,making your permalinks more pretty and SEO friendly.

Because IIS does not support mod_rewrite module for rewriting URLs, if you are running WordPress on Windows IIS,whenever you try to use Permalinks, you always get something like this:

http://www.yourdomain.com/index.php/2006/09/02/…../

this ISAPI filter can remove ugly "index.php" from the URL automatically.it’s easy to use,just install it and no further configuration is necessary.

download wp_url_rewriting

Key benefits:

  • Speed.

    this URL Rewriting engine is writing in C++,because it’s designed dedicated for WordPress,so the program’s logic is very simple and no regular expression is used.it’s extremely faster than other rewriting engines on the IIS platform.
  • Support Multiple WordPress sites on one server.
  • No configuration is needed

    This URL Rewriting engine will automatically detect all WordPress sites on your server,and generate URL rewriting rules for each of them.

Living Demo

You can take my site as a living demo, navigate through my sites,watch the ‘pretty’ address in the browser.

Limitation

You must have administrator privileges on the target server in order to install this plugin.

Installation

1). Copy wp-url-rewriting.dll to the target machine and register it as an ISAPI filter using IIS MMC snap-in. wp-url-rewriting.dll can be registered at either the site level or the global level.Note:if WordPress is not installed in the root directory of your site,you should setup it as a virtual directory.

2). after register,Login to your WordPress admin panel->options->Permalinks,make sure you have removed the index.php from your permalink structure.

3). Do a little hack to WordPress file ‘link-template.php’ to make paging works well for categories:(NOTE: You don’t need to do this step if you are using WordPress 2.3.0 or newer)

open file /wp-includes/link-template.php and find the following code:

function get_pagenum_link($pagenum = 1) {

 global $wp_rewrite;

 $qstr = $_SERVER['REQUEST_URI'];

 $page_querystring = "paged";

 ...

}


Replace it with:

function get_pagenum_link($pagenum = 1) {

 global $wp_rewrite;

 $qstr = $_SERVER['PATH_INFO'];

 $page_querystring = "paged";

 ...

}

Revision History

  • version 2.1 -2007-8-29
    • Fixed a bug that may cause rewriting failed if there are many blogs under a single site.
  • version 2.0 -2007-8-17
    • no configuration file needed.This version automatically generate URL rewriting rules for each WordPress sites on the same server.
    • Support multiple WordPress sites on the same server.you can install this filter at the global level to support multiple WordPress sites on your server.
  • version 1.1 – 2006-11-1
    • optimize algorithm.
    • exclude directories : wp-admin,wp-content from the url rewriting rules.
  • version 1.0 – 2006-9-2
    • Initial Version

this ISAPI filter has been build with /MT switch(multithread, static CRT) ,if you failed to load this plugin,download vcredist_x86.exe from microsoft,run it on the target computer,this installs all Visual C++ libraries as shared assemblies.

Please feel free to report any bugs.

This project is licensed under GNU General Public License 2.0.

Sina:抄了皮子,丢的,不仅是面子

今天,微软Expro’s项目的领头人,Garry Wisemen在他的blog写了一篇题为“Sina.com steals our design and graphics”的文章

Sina.com steals our design and graphics

We were recently made aware (thanks to a comment on our blog by Phillipe) of a new classifieds site in China that had not only lifted our previous user interface’s look & feel, but also directly copied some of our graphics (note the cool pushpin graphic that our designer Becky created). The shocking thing is that the website in question is owned by one of China’s largest search engines called Sina.com. Couldn’t they afford to hire a designer?
Anyway, take a look for yourself at the images I’ve attached to this blog entry. Alternatively, check out the screenshots of our previous UI and then visit: http://post.sina.com.cn/v3_index.php
Let’s just say that I’m looking forward to our China launch..
– Garry

真是替新浪汗颜,如果真是贫乏到要靠抄袭才能谋生的地步,也要讲究点抄袭的技巧吧。UI并不是一切,界面背后的创造和灵感才是网站的生命。中国的IT行业抄袭国外这么多年了,还是停留在抄皮子的高度,确实是一件让人感到羞耻的事情 。 不知道sina有没有人会为此脸红。

Box.net

box.net免费提供1GB的文件存储服务,无论是界面还是操作都非常简单,甚至不需要申请帐号都能使用。 和国内那些花哨的在线存储系统相比,box.net已经不是同一个境界的东西了,国内这些服务提供商什么时候才能明白:"Simple Is Beautiful"的道理呢?。。。

Google Apps for Your Domain

intermedia.net发表了一篇对于goole apps for domain的评论,看完有点忍俊不禁。文章认为:

The Apps for your Domain key features:

    • 24×0 support. This is important because companies for whom email and schedules are mission-critical will want to know they can pick up a phone and get support 24 hours a day, 0 days per week. Google also gives the option of filling out a support form and receiving an automated response.
    • No wireless access. Where Intermedia.NET hosted Exchange gives users access to information via BlackBerry, Treo, Q or any other device, Google has bucked this trend, perhaps suggesting that wireless email is in fact a productivity-sapping distraction for employees.
    • Private data read by others. Google Apps for your Desktop again bucks the trend that businesses should not allow outsiders to read their proprietary documents and email. Businesses can rest easy knowing that Google is looking at all emails and documents.
    • Ads inside applications. Clearly, employees are more productive when their business applications stream ads for online poker sites and pills to combat ED.
    • No uptime guarantee. Rather than a predictable 99.9% uptime guarantee, such as the one offered by Intermedia.NET, Google does not provide a set percentage of the time when email will be up and running. This keeps corporate collaboration more exciting, by allowing staff to guess whether the system will be working or not.

作为全美最大的提供Microsoft Exchange hosting服务的公司,对google心存畏惧是在所难免的,可在官方网站发表这样的见解,也未免太过于心胸狭窄了。他忘了提google的服务是免费的,选择google的人自然也会认可、接受这些“瑕疵”(姑且承认这是瑕疵)。从另一个层面看,正是有google这些提供无偿服务的公司,激活了大量潜在的用户群,活跃了整个行业。其实也为intermedia这样的以赢利为唯一目标的商业性公司带来了更多的意向客户。

微软或者这些依赖微软生存的公司,应该更勇敢的去面对google的挑战。像男人一样战斗,而不是吐口水。

尊重对手,而不是贬低对手。像google致敬。

apt-get … Segmentation Faulty Tree

I have tried to use both Synaptic and apt-get, Running apt-get install (anything) from the command line (as root) yeilds this result:

root@dean-laptop:/# apt-get install netselect
Reading package lists… Done
Segmentation faulty tree… 0%

I’ve done some GOOGLING and it seems this bug has been seen before,
but no intelligent handling has been added to apt. See bug Bug#84277
where Jason Gunthorpe writes:

> apt-get segfaults w/out (in my opinion) any reason:

This has always been traced back to file corruption in /var/cache/apt/*.bin

If you can erase those files and run the apt command and have it work then
that is definately the problem.

Nobody has ever been able to reproduce it, unless they have buggy hardware
:>

Jason

So I suspect corrupted data files. delete /var/cache/apt/*.bin followed by “apt-get update” to reset apt.Which indeed fixed this error condition.

but that’s very strange,My Laptop does not not have buggy hardware, apt-get has been working fine from the start.

Compile the new 2.6.17 Linux kernel for IBM ThinkPad T43

I just finished compiling the newest 2.6.17 Linux kernel and I am getting much better performance. In what follows, I will show you how to compile and configure the latest kernel for IBM Thinkpad T43.

Before you begin, you will need to get a kernel
Download the 2.6.17 kernel and it’s performance patch: The 2.6.17 kernel
Latest Kernel Patch

1. Install needed utilities to configure the kernel
sudo apt-get install build-essential bin86 kernel-package libqt3-headers libqt3-mt-dev
2. Now we are going to move the kernel and unpack it.
sudo cp linux-2.6.16.tar.bz2 /usr/src
3. Now we are going to move to /usr/src
cd /usr/src
4. Now unpack it:
sudo tar -xvjf linux-2.6.17.tar.bz2
5. Rename the folder:
sudo mv linux-2.6.17/ linux-2.6.17ck1
6. Now we are going to remove the link to the linux directory:
sudo rm -rf linux
7. Make a new link to the new kernel:
sudo ln -s /usr/src/linux-2.6.17ck1 linux
8. Move to the Linux directory:
cd /usr/src/linux
9. Make yourself root:
sudo -s -H
10. Apply the performance patch:
bzcat /home/$USER/patch-2.6.17-ck1.bz2| patch -p1
11. Now we are going to import your current kernel configuration:
uname -r
12. Now import it: Make sure to replace the kernel version in this following command from the one from uname -r.
sudo cp /boot/config-2.6.14-ck1 .config
or you can download my .config file,full optmized for IBM T43

13.Configure the kernel:
make xconfig
14. Let’s build the kernel: Make sure that you are in /usr/src/linux with full root access. Make sure that you are. This will build a debian file that you can install.

Continue reading “Compile the new 2.6.17 Linux kernel for IBM ThinkPad T43”

十年生死两茫茫,不思量,自难忘。

今天看了新浪的一则新闻 六旬翁带妻子骨灰骑车周游全国各地 看完有想落泪的感觉,不是因为脆弱,而是因为一种久违的感动和震撼。一个平凡的老人,为了一个平凡的诺言,用他自己的方式将爱诠释得如此淋漓尽致。。。

十年生死两茫茫,
不思量,自难忘。
千里孤坟,无处话凄凉。
纵使相逢应不识,尘满面,鬓如霜。
夜来幽 梦忽还乡。
小轩窗,正梳妆。
相顾无言,惟有泪千行。
料得年年肠断处,明月夜,短松冈。

Jenkins hash算法。

Jenkins hash,可能是目前能看到的最好的hash算法之一,可以产生很好的分布,缺点是相比其他常见的hash算法更耗时。可以考虑用于hash表的open addressing实现上。如果想了解细节的话,可以去Bob Jenkins的站点看一看。

#define hashsize(n) ( 1U << (n) )
#define hashmask(n) ( hashsize ( n ) - 1 )

#define mix(a,b,c) 
{ 
a -= b; a -= c; a ^= ( c >> 13 ); 
b -= c; b -= a; b ^= ( a << 8 ); 
c -= a; c -= b; c ^= ( b >> 13 ); 
a -= b; a -= c; a ^= ( c >> 12 ); 
b -= c; b -= a; b ^= ( a << 16 ); 
c -= a; c -= b; c ^= ( b >> 5 ); 
a -= b; a -= c; a ^= ( c >> 3 ); 
b -= c; b -= a; b ^= ( a << 10 ); 
c -= a; c -= b; c ^= ( b >> 15 ); 
}

unsigned jen_hash ( unsigned char *k,
unsigned length, unsigned initval )
{
unsigned a, b;
unsigned c = initval;
unsigned len = length;

a = b = 0x9e3779b9;

while ( len >= 12 ) {
a += ( k[0] + ( (unsigned)k[1] << 8 )
+ ( (unsigned)k[2] << 16 )
+ ( (unsigned)k[3] << 24 ) );
b += ( k[4] + ( (unsigned)k[5] << 8 )
+ ( (unsigned)k[6] << 16 )
+ ( (unsigned)k[7] << 24 ) );
c += ( k[8] + ( (unsigned)k[9] << 8 )
+ ( (unsigned)k[10] << 16 )
+ ( (unsigned)k[11] << 24 ) );

mix ( a, b, c );

k += 12;
len -= 12;
}

c += length;

switch ( len ) {
case 11: c += ( (unsigned)k[10] << 24 );
case 10: c += ( (unsigned)k[9] << 16 );
case 9 : c += ( (unsigned)k[8] << 8 );
/* First byte of c reserved for length */
case 8 : b += ( (unsigned)k[7] << 24 );
case 7 : b += ( (unsigned)k[6] << 16 );
case 6 : b += ( (unsigned)k[5] << 8 );
case 5 : b += k[4];
case 4 : a += ( (unsigned)k[3] << 24 );
case 3 : a += ( (unsigned)k[2] << 16 );
case 2 : a += ( (unsigned)k[1] << 8 );
case 1 : a += k[0];
}

mix ( a, b, c );

return c;
}

web 2.0 天使还是魔鬼?

最近听到周围不少人关于web2.0的讨论,不少人极力宣称Web2.0是解决基于web应用中的所有问题的最佳方案。但是Web 2.0仅是关于如何设计和构建基于Web应用的一种思考方法而决不是什么解决方案。过度推崇本身就是某种意义上的不理性。

Flickr是Web 2.0的绝佳例子,一群天才的程序员的非凡的作品。但是它真的令整个世界疯狂吗?如果有一天它不在了,有人会为它而死吗?很多"传统的站点"仍然在我们的心目中至高无上,无处不在。比如google已经7岁了,谁会在意他是否提供基于web 2.0的搜索服务呢?请不要误解我的观点,一定会有更多具有惊人创意的web 2.0站点涌现出来,但是所有这一切的前提是:你和你的应用要做什么?要为人们提供什么?web 2.0仅是能达到你所要的目标的无数解决方案中很值得去考虑的一种。

Continue reading “web 2.0 天使还是魔鬼?”

windows下内存分配方式的性能对比

今天写了一个小程序测试了一下在windows系统上不同内存分配方式间的性能差异,比较内容:VirtualAlloc,malloc,new,和HeapAlloc。代码很简单,循环分配并释放内存,最后计算每种方法所耗用的时间。

测试结果:

Virtual Memory total time:125,kernel:125,user:0
new total time:203,kernel:171,user:31
malloc total time:125,kernel:125,user:0
heap total time:125,kernel:109,user:15
Press any key to continue . . .

代码:

#include "stdafx.h"
#include "dlclib.h"
#include
using namespace std;
const int LOOPS = 1000;

void PrintResult(CProfileWatch& watch, LPCSTR pszMethod)
{
cout << pszMethod << "t"
<< "total time:" << watch.GetTotalTimeElapsedMs()
<< ",kernel:" << watch.GetKernelTimeElapsedMs()
<< ",user:" << watch.GetUserTimeElapsedMs()
<< "n";
}
int _tmain(int argc, _TCHAR* argv[])
{
SYSTEM_INFO info;
::GetSystemInfo(&info);
DWORD dwPageSize = info.dwPageSize;
CProfileWatch watch;
DWORD dwSize = MemAlign(1024 * 1024 * 10, dwPageSize);
watch.Start();
for (int i = 0; i < LOOPS; ++i)
{
void* pMem = ::VirtualAlloc(NULL, dwSize, MEM_RESERVE |
MEM_COMMIT | MEM_TOP_DOWN, PAGE_READWRITE);
::VirtualFree(pMem, 0, MEM_RELEASE);
}
watch.Stop();
PrintResult(watch, "Virtual Memory");

watch.Start();
for (int i = 0; i < LOOPS; ++i)
{
void* pMem = new char[dwSize];
delete []pMem;
}
watch.Stop();
PrintResult(watch, "new");

watch.Start();
for (int i = 0; i < LOOPS; ++i)
{
void* pMem = malloc(dwSize);
free(pMem);
}
watch.Stop();
PrintResult(watch, "malloc");
HANDLE hHeap = ::HeapCreate(HEAP_NO_SERIALIZE, 0, 0);
watch.Start();
for (int i = 0; i < LOOPS; ++i)
{
void* pMem = ::HeapAlloc(hHeap, HEAP_NO_SERIALIZE, dwSize);
::HeapFree(hHeap, HEAP_NO_SERIALIZE, pMem);
}
watch.Stop();
PrintResult(watch, "heap");
::HeapDestroy(hHeap);
return 0;

}

Hash table collision resolution

最近需要一个高性能的hash实现,对比了很多实现hash表的源代码,主要区别在于冲突检测的实现方式。

关于冲突检测的两种方式的对比(chaining and open addressing):

http://www.absoluteastronomy.com/enc1/hash_table

Google有一个高效的实现Open addressing hash 的开源项目

http://goog-sparsehash.sourceforge.net/

Open addressing的性能可能会更好些,但是前提是需要一个非常好的hash生成算法。如果使用Chaining方式,加上一个非常高效的memory pool管理,减少cache失效时间,平均性能应该和Open addressing差不多。

顺便了解下judy:
A Performance Comparison of Judy to Hash Tables

A fast memory Pool used by LogMicroscope@35

As you know, new/delete operations take a lot of CPU time. If you work with servers, CPU time is important. If additional memory is added to the server, then the servers’ available memory size will grow in a linear fashion.
So Log Microscope has it’s own efficient memory management system. CFastMemoryPool is the one of them for LogMicroscope@35. because I never need to delete the instance,so a single call to Shrink() will free all memory allocated.it’s extreme fast,enjoy it!

class CFastMemoryPool
{
public:
typedef struct tagPlex
{
tagPlex *_next;
tagPlex *_free;
size_t  _freeindex;
#ifndef _WIN64
#if (_AFX_PACKING >= 8)
DWORD dwReserved[1];    // align on 8 byte boundary
#endif
#endif
inline void *data()
{
return this + 1;
}
}PLEX, *PPLEX;

CFastMemoryPool()
{
m_pBlock = NULL;
m_pFreeBlock = 0;
m_nBlockSize = 0;
m_nNumberOfObjectsInSegment = 0;
m_nNumberOfSegmentsStrat = 0;
m_nObjectSize = 0;
m_nAllocatedSegment = 0;
}
~CFastMemoryPool()
{
Destroy();
}

operator bool()const
{
return m_pBlock != NULL;
}
void *AllocBuffer();
void Shrink();
void Destroy();
HRESULT Initialize(size_t nObjectSize, size_t nNumberOfBuffersInSegment,
size_t nNumberOfSegmentsStrat);
private:
PPLEX AllocBlock();
PPLEX m_pBlock;
PPLEX m_pFreeBlock;
size_t m_nBlockSize;
size_t m_nAllocatedSegment;
size_t m_nNumberOfObjectsInSegment;
size_t m_nNumberOfSegmentsStrat;
size_t m_nObjectSize;
};

HRESULT CFastMemoryPool::Initialize(size_t nObjectSize, size_t nNumberOfObjectsInSegment,
size_t nNumberOfSegmentsStrat)
{
SYSTEM_INFO sysinfo;
GetSystemInfo(&sysinfo);
m_nObjectSize = nObjectSize;
m_nBlockSize = MemAlign(nNumberOfObjectsInSegment * nObjectSize + sizeof( PLEX ),
sysinfo.dwAllocationGranularity);
m_nAllocatedSegment = 0;
m_nNumberOfSegmentsStrat = nNumberOfSegmentsStrat;
m_nNumberOfObjectsInSegment = nNumberOfObjectsInSegment;
for (size_t i = 0; i < nNumberOfSegmentsStrat; ++i)
{
AllocBlock();
}
return S_OK;
}

CFastMemoryPool::PPLEX CFastMemoryPool::AllocBlock()
{
PPLEX pBlock = static_cast< PPLEX >( VirtualAlloc(NULL,
m_nBlockSize, MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN, PAGE_READWRITE) );
++m_nAllocatedSegment;
pBlock->_next = m_pBlock;
m_pBlock = pBlock;
pBlock->_freeindex = 0;
pBlock->_free = m_pFreeBlock;
m_pFreeBlock = pBlock;
return pBlock;
}

void *CFastMemoryPool::AllocBuffer()
{
while (m_pFreeBlock)
{
if (m_pFreeBlock->_freeindex < m_nNumberOfObjectsInSegment)
{
return ((BYTE *)m_pFreeBlock->data() + m_nObjectSize
* m_pFreeBlock->_freeindex++);
}
m_pFreeBlock = m_pFreeBlock->_free;
}
AllocBlock();
return ((BYTE *)m_pFreeBlock->data() + m_nObjectSize * m_pFreeBlock->_freeindex++);
}

void CFastMemoryPool::Shrink()
{
for (size_t i = m_nNumberOfSegmentsStrat; i < m_nAllocatedSegment; ++i)
{
PPLEX pKill = m_pBlock;
m_pBlock = m_pBlock->_next;
::VirtualFree(pKill, 0, MEM_RELEASE);
-- m_nAllocatedSegment;
}
m_pFreeBlock = NULL;
PPLEX pPlex = m_pBlock;
while (pPlex)
{
pPlex->_freeindex = 0;
pPlex->_free = m_pFreeBlock;
m_pFreeBlock = pPlex;
pPlex=pPlex->_next;
}
}

void CFastMemoryPool::Destroy()
{
PPLEX pPlex = m_pBlock;
while (m_pBlock)
{
PPLEX pKill = m_pBlock;
m_pBlock = m_pBlock->_next;
::VirtualFree(pKill, 0, MEM_RELEASE);
}
m_pFreeBlock = NULL;
m_nAllocatedSegment = 0;
}

谁制造了混乱

网上看到一篇笑话,描述软件开发过程的:

1. 程序员写出自认为没有Bug的代码。
2. 软件测试,发现了20个Bug。
3. 程序员修改了10个Bug,并告诉测试组另外10个不是Bug。
4. 测试组发现其中5个改动根本无法工作,同时又发现了15个新Bug。
5. 重复3次步骤3和步骤4。
6. 鉴于市场方面的压力,为了配合当初制定的过分乐观的发布时间表,产品终于上市了。
7. 用户发现了137个新Bug。
8. 已经领了项目奖金的程序员不知跑到哪里去了。
9. 新组建的项目组修正了差不多全部137个Bug,但又发现了456个新Bug。
10. 最初那个程序员从斐济给饱受拖欠工资之苦的测试组寄来了一张明信片。整个测试
组集体辞职。
11. 公司被竞争对手恶意收购。收购时,软件的最终版本包含783个Bug。
12. 新CEO走马上任。公司雇了一名新程序员重写该软件。
13. 程序员写出自认为没有Bug的代码。

虽然是个笑话,但是这个笑话几乎每天都在发生。由此想起另一个笑话:

晚上,一个建筑师、一个钓鱼的和一个程序员坐在一起聊天,并
开始比较他们各自的职业哪一个更为古老。
“嘿,兄弟们!大家都晓得钓鱼是最古老的职业。”钓鱼的说。
“啊,”建筑师说,“但在你的职业诞生之前,总要有人才行吧。
那么,人类诞生之前,这世界上又有谁呢?”
“你在说什么呀?难道是上帝吗?”钓鱼的说。
“对呀,难道上帝不是这整个宇宙的建筑师吗?”建筑师自鸣得
意地问。
程序员一直在沉默,这时,他突然插话说:“那么,在上帝成为
建筑师之前,这世界上有什么?”
“黑暗和混乱。”钓鱼的说。
“那么,你知道是谁制造了混乱吗?”程序员说。