MongoDB: Optimize index to avoid the scanAndOrder operation

Recently,I’ve got a performance issue with MongoDB index,it confused me and took me awhile to figure it out.let me try to get this issue straight:I have a compound index on multiple fields,but always result in a slow scan and order operation when performing a range query ,which means that the index is not being used for sorting.

For example,if I have an index “{a:1, b:1, c:-1}”,it works well for find({a:1,b:2}).sort({c:-1}), but  very slowly for find( {a:{$in:[1,2,3]}, b:4}).sort({c:-1}),here’s the output of the explain plan:

Continue reading “MongoDB: Optimize index to avoid the scanAndOrder operation”

OpenSSL for windows

openssl-for-windows,host on google code,is an clean and lightweight OpenSSL library for windows(both 32bit and x64 version).

all versions are compiled in the assembly language routines with Microsoft Visual Studio 2005 and Microsoft MASM as “Multithreaded(/MT)”,you can using it with any version of MSVC.

Download openssl for windows

files included in this library are listed below:

|bin (libeay32.dll, ssleay32.dll,openssl.exe)
|include
|– — openssl  (openssl head files)
|lib (
libeay32.lib,ssleay32.lib)

if you find it useful, a time saver,or even just mildly interesting,then my effort was worth it.
if you have any suggestions on how to improve it, I’d love to hear it.

link to this page: http://www.deanlee.cn/programming/openssl-for-windows/

Enjoy:)

基于Wu-Manber算法的快速多关键词搜索

很多应用都需要高性能的多关键词搜索功能,网络”扫黄”开始后更是如此,如何在文本内容中快速地搜索出敏感关键字变得越来越重要。

Wu-Manber算法是一个不错的解决方案,尤其是针对包含有大量关键词的搜索。Wu-Manber算法的C语言实现非常丰富,为了方便如C#,VB.net等编程语言使用该算法,我用C/C++写了一个基于改进的Wu-Manber算法的多关键词搜索组件:wu-manber-com

Download wu-manber-com

使用方法

以vb.net/c#为例,使用前,先运行regsvr32 stringsearch.dll注册该组件,然后将stringsearch.dll添加到项目引用中。调用示例代码如下:

StringSearchLib.WuManber s = new StringSearchLib.WuManber();
s.AddPatterns("keyword1,keyword2,keyword3", ",", false);
int index = s.Search("... some text for searching...");
Console.WriteLine(index);

该控件不会搜寻所有存在的关键词,碰到第一个匹配的关键词,即返回该匹配的关键词在原文中的位置索引,没有找到则返回-1。

Compile with /MP flag on my system

The most interesting feature I like in VC 2008 is the parallel build capability.

The /MP option can reduce the total time to compile the source files on the command line. The /MP option causes the compiler to create one or more copies of itself, each in a separate process. Then these copies simultaneously compile the source files. Consequently, the total time to build the source files can be significantly reduced.

Continue reading “Compile with /MP flag on my system”

QT/VC2008:Project is rebuilt every time even though I didn’t make any modifications

I have encountered problem when attempting to compile VC projects created by QT 4.5.2,VC always rebuilds the whole project when I start to run or build the project, even though I didn’t make any modifications to the code.the project is always “out of date”.

that is because the custom build step generated by QT for each header files that contain the Q_OBJECT macro are dependent on mocinclude.tmp,which being created every time the project is built,the mocinclude.tmp’s date is later than the header files,so the custom build rule will be run every time.

to solve this problem,open %QTDIR%/makespecs/features/moc.prf,comment out  these two lines(add ‘#’’ at the start of line):

!isEmpty(INCLUDETEMP):moc_source.depends += $$INCLUDETEMP 
......
!isEmpty(INCLUDETEMP):moc_header.depends += $$INCLUDETEMP

and run qmake –tp vc again to regenerate the project files.

rOptiPng:an modification to OptiPNG to support recursive directory optimization

rOptiPng,hosted on google code, is an modification to OptiPNG to support the optimization of multiple directories with recursion,making it easier to use,especially when optimizing many image files in different directories,such as a website.

Download roptipng

To optimization of multiple directories recursively with rOptiPng,simply by using the ** wildcard in file path.
for examples:
roptipng **/*.png
roptipng c:\photos\**\*.png
roptipng c:\photos\**\*.png d:\website\**\*.png
roptipng c:\photos\**\image8.png

image

if you find it useful, a time saver,or even just mildly interesting,then my effort was worth it.
if you have any suggestions on how to improve it, I’d love to hear it.

Enjoy:)

inffas32.asm: error A2070: invalid instruction operands

I got the following errors while compiling zlib on windows with Visual studio 2005:

1>inffas32.asm(649) : error A2070: invalid instruction operands
1>inffas32.asm(663) : error A2070: invalid instruction operands
1>inffas32.asm(720) : error A2070: invalid instruction operands

these errors are due to an issue with Microsoft Macro Assembler ,included with Visual C++ 2005.it refuses to assemble a MOVD instruction with a memory operand with an implied size, and requires that "dword ptr" prefix the memory operand.

to fix these problems,simply addingDWORD PTR’ before the operands:

movd mm5,dword ptr [esp+4](649)
movd mm7,dword ptr [esi](663)
movd mm7,dword ptr [esi](720)

Continue reading “inffas32.asm: error A2070: invalid instruction operands”

How to create a jabber chat bot based on AIML,the C# way.

I have created a jabber chat bot based on on the A.L.I.C.E(Artificial Linguistic Internet Computer Entity) and AIML(Artificial Intelligence Markup Language).which is named alice@35.cn,you can add alice@35.cn to your contact list and chat with her anytime:

image

It’s really simple to build your own alice chat bot by using agsXMPP and AIMLBot.The following sections is my quick and dirty tour on how to do it(You should know how to use agsXMPP and AIMLBot first):

After downloading and compiling the agsXMPP and AIMLBot,add them as References to your bot project:

image

Initialize AIMLBot :

AIMLbot.Bot myBot = new AIMLbot.Bot();
myBot.loadSettings();
myBot.loadAIMLFromFiles();

Response chat message in XmppClientConnection’s OnMessage Event:

AIMLbot.Request request = new Request(msg.Body, GetUser(msg.From.Bare), myBot);
AIMLbot.Result reply = myBot.Chat(request);
agsXMPP.protocol.client.Message replyMsg = new agsXMPP.protocol.client.Message();
replyMsg.Type = MessageType.chat;
replyMsg.To = msg.From;
replyMsg.Body = reply.Output; 

XmppCon.Send(replyMsg); 

a simple implementation of GetUser:

protected AIMLbot.User GetUser(String jid)
{
   Hashtable myHash = Hashtable.Synchronized(userSessions);
   AIMLbot.User user = (AIMLbot.User)myHash[jid];
   if (user == null)
   {
       user = new AIMLbot.User(jid, myBot);
       myHash[jid] = user;
   }
   return user;
} 

that’s all,it’s really easy and simple,huh?

PS:you can download free AIML sets from here

How to compile TORCS with visual studio 2005(VC 8.0)

TORCS is an open source, highly portable multi platform car racing simulation. It is used as ordinary car racing game, as AI racing game and as research platform.
image

TORCS can be compiled smoothly with Visual C++ 6.0.but you’ll get lots of compilation errors when trying to compile it with Visual Studio 2005(VC 8).here is the steps to resolve this issue:

Continue reading “How to compile TORCS with visual studio 2005(VC 8.0)”

What I’m doing and what will be done

With months of development during off-duty time,I had created an web-based Microsoft SQL Database administration tool in C/C++,which codename is "Labrina".

image 

With Labrina , you will do through a browser almost everything you did before with MS Enterprise Manager.It is intended to handle the administration of SQL Server databases over the Web. It allows you to fully manage your databases even when you don’t want or cannot use MS Enterprise Manager.

Continue reading “What I’m doing and what will be done”

Compiler warning C4022 when building OpenSSL 0.9.8e On Windows

I got the following compiler error when building OpenSSL-0.9.8e in Windows environments with Visual studio 2005:

.\apps\enc.c(380) : error C2220: warning treated as error – no ‘object’ file generated
.\apps\enc.c(380) : warning C4022: ‘BIO_ctrl’ : pointer mismatch for actual parameter 4

Here is the quick fix to solve this compile error:

Open file .\apps\enc.c,Go to line 380:

if (BIO_read_filename(in,inf <= 0))
{
perror(inf);
goto end;
}

 Change it to

if (BIO_read_filename(in,(char*)(inf <= 0)))
{
perror(inf);
goto end;
}

you can download the compiled binary here.

How to detect whether or not user is running a full-screen program

Recently I was asked how to detect whether or not user is runing a full screen program,here is the code snippet: 

bool IsFullScreenMode()
{
  int w = GetSystemMetrics(SM_CXSCREEN);
  int h = GetSystemMetrics(SM_CYSCREEN); 

  HWND hWnd = 0;
  while (hWnd = FindWindowEx(NULL, hWnd, NULL, NULL))
  {
    if (GetWindowLong(hWnd, GWL_EXSTYLE) & WS_EX_TOPMOST)
    {
      RECT rcWindow;
      GetWindowRect(hWnd, &rcWindow);
      if ((w == (rcWindow.right - rcWindow.left)) &&
         (h == (rcWindow.bottom - rcWindow.top)))
           return true; 
     }
  } 
  return false;
}

Convert Ip Address To Geographic Location

基于C++的CIPSeeker类,查找IP所在的地理位置,同时为了方便在其他脚本语言中提供IP地理位置查找功能(asp或者C#,vb等),另作一个COM控件来包装这个类。

网上有不少关于如何使用纯真IP数据库(QQWRY.dat),来查找IP所在的地理位置的文档和源代码,但基本都是java实现,很少有C的,而且大部分都使用了打开文件句柄,通过文件指针的多次跳转来实现单次查找,运行效率很低,尤其是在短时间内需要执行大量IP->地理位置的转换的场合,这种方式的运行性能基本是难以接受的。

Continue reading “Convert Ip Address To Geographic Location”

A Programmer’s Reference for Excuses

Private Function getProgrammerExcuse() As String

Dim myExcuses(1 to 21) As String

myExcuses(1) = "That's Weird...",
myExcuses(2) = "It's never done that before.",
myExcuses(3) = "It worked yesterday.",
myExcuses(4) = "How is that possible?",
myExcuses(5) = "It must have a hardware problem.",
myExcuses(6) = "What did you type in wrong to get it to crash." ,
myExcuses(7) = "There is something funky in your data",
myExcuses(8) = "I haven't touched that module in weeks!" ,
myExcuses(9) = "You must have wrong version.",
myExcuses(10) = "It's just some unlucky coincidence." ,
myExcuses(11) = "I can't test everything!",
myExcuses(12) = "THIS can't be the source of THAT." ,
myExcuses(13) = "It works, but it's not been tested." ,
myExcuses(14) = "Somebody must have changed my code.",
myExcuses(15) = "Did you check for a virus on your system?",
myExcuses(16) = "Even though it doesn't work, how does it feel?" ,
myExcuses(17) = "You can't use that version on your system.",
myExcuses(18) = "Why do you want to do it that way?",
myExcuses(19) = "Where were you when the program blew up?",
myExcuses(20) = "I thought I fixed that.",
myExcuses(21) = "This is not a bug,it's undocumented feature."

Dim randomChoice As New Random(UBound(myExcuses))
Dim iRandomChoice As Integer = randomChoice.Next()

Return myExcuses(iRandomChoice)

End Function

you can get more funny code from Comedy Code

Convert Unicode To UTF8

char* __stdcall UnicodeToUtf8( const WCHAR* wstr )
{
    const WCHAR* w;
    // Convert unicode to utf8
    int len = 0;
    for ( w = wstr; *w; w++ ) {

        if ( *w < 0x0080 ) len++;
        else if ( *w < 0x0800 ) len += 2;
        else len += 3;
    }

    unsigned char* szOut = ( unsigned char* )malloc( len+1 );

    if ( szOut == NULL )
        return NULL;

    int i = 0;
    for ( w = wstr; *w; w++ ) {
        if ( *w < 0x0080 )
            szOut[i++] = ( unsigned char ) *w;
        else if ( *w < 0x0800 ) {
            szOut[i++] = 0xc0 | (( *w ) >> 6 );
            szOut[i++] = 0x80 | (( *w ) & 0x3f );
        }
        else {
            szOut[i++] = 0xe0 | (( *w ) >> 12 );
            szOut[i++] = 0x80 | (( ( *w ) >> 6 ) & 0x3f );
            szOut[i++] = 0x80 | (( *w ) & 0x3f );
        }    }

    szOut[ i ] = '\0';
    return ( char* )szOut;
}

Jenkins hash算法。

Jenkins hash,可能是目前能看到的最好的hash算法之一,可以产生很好的分布,缺点是相比其他常见的hash算法更耗时。可以考虑用于hash表的open addressing实现上。如果想了解细节的话,可以去Bob Jenkins的站点看一看。

#define hashsize(n) ( 1U << (n) )
#define hashmask(n) ( hashsize ( n ) - 1 )

#define mix(a,b,c) 
{ 
a -= b; a -= c; a ^= ( c >> 13 ); 
b -= c; b -= a; b ^= ( a << 8 ); 
c -= a; c -= b; c ^= ( b >> 13 ); 
a -= b; a -= c; a ^= ( c >> 12 ); 
b -= c; b -= a; b ^= ( a << 16 ); 
c -= a; c -= b; c ^= ( b >> 5 ); 
a -= b; a -= c; a ^= ( c >> 3 ); 
b -= c; b -= a; b ^= ( a << 10 ); 
c -= a; c -= b; c ^= ( b >> 15 ); 
}

unsigned jen_hash ( unsigned char *k,
unsigned length, unsigned initval )
{
unsigned a, b;
unsigned c = initval;
unsigned len = length;

a = b = 0x9e3779b9;

while ( len >= 12 ) {
a += ( k[0] + ( (unsigned)k[1] << 8 )
+ ( (unsigned)k[2] << 16 )
+ ( (unsigned)k[3] << 24 ) );
b += ( k[4] + ( (unsigned)k[5] << 8 )
+ ( (unsigned)k[6] << 16 )
+ ( (unsigned)k[7] << 24 ) );
c += ( k[8] + ( (unsigned)k[9] << 8 )
+ ( (unsigned)k[10] << 16 )
+ ( (unsigned)k[11] << 24 ) );

mix ( a, b, c );

k += 12;
len -= 12;
}

c += length;

switch ( len ) {
case 11: c += ( (unsigned)k[10] << 24 );
case 10: c += ( (unsigned)k[9] << 16 );
case 9 : c += ( (unsigned)k[8] << 8 );
/* First byte of c reserved for length */
case 8 : b += ( (unsigned)k[7] << 24 );
case 7 : b += ( (unsigned)k[6] << 16 );
case 6 : b += ( (unsigned)k[5] << 8 );
case 5 : b += k[4];
case 4 : a += ( (unsigned)k[3] << 24 );
case 3 : a += ( (unsigned)k[2] << 16 );
case 2 : a += ( (unsigned)k[1] << 8 );
case 1 : a += k[0];
}

mix ( a, b, c );

return c;
}

Hash table collision resolution

最近需要一个高性能的hash实现,对比了很多实现hash表的源代码,主要区别在于冲突检测的实现方式。

关于冲突检测的两种方式的对比(chaining and open addressing):

http://www.absoluteastronomy.com/enc1/hash_table

Google有一个高效的实现Open addressing hash 的开源项目

http://goog-sparsehash.sourceforge.net/

Open addressing的性能可能会更好些,但是前提是需要一个非常好的hash生成算法。如果使用Chaining方式,加上一个非常高效的memory pool管理,减少cache失效时间,平均性能应该和Open addressing差不多。

顺便了解下judy:
A Performance Comparison of Judy to Hash Tables

A fast memory Pool used by LogMicroscope@35

As you know, new/delete operations take a lot of CPU time. If you work with servers, CPU time is important. If additional memory is added to the server, then the servers’ available memory size will grow in a linear fashion.
So Log Microscope has it’s own efficient memory management system. CFastMemoryPool is the one of them for LogMicroscope@35. because I never need to delete the instance,so a single call to Shrink() will free all memory allocated.it’s extreme fast,enjoy it!

class CFastMemoryPool
{
public:
typedef struct tagPlex
{
tagPlex *_next;
tagPlex *_free;
size_t  _freeindex;
#ifndef _WIN64
#if (_AFX_PACKING >= 8)
DWORD dwReserved[1];    // align on 8 byte boundary
#endif
#endif
inline void *data()
{
return this + 1;
}
}PLEX, *PPLEX;

CFastMemoryPool()
{
m_pBlock = NULL;
m_pFreeBlock = 0;
m_nBlockSize = 0;
m_nNumberOfObjectsInSegment = 0;
m_nNumberOfSegmentsStrat = 0;
m_nObjectSize = 0;
m_nAllocatedSegment = 0;
}
~CFastMemoryPool()
{
Destroy();
}

operator bool()const
{
return m_pBlock != NULL;
}
void *AllocBuffer();
void Shrink();
void Destroy();
HRESULT Initialize(size_t nObjectSize, size_t nNumberOfBuffersInSegment,
size_t nNumberOfSegmentsStrat);
private:
PPLEX AllocBlock();
PPLEX m_pBlock;
PPLEX m_pFreeBlock;
size_t m_nBlockSize;
size_t m_nAllocatedSegment;
size_t m_nNumberOfObjectsInSegment;
size_t m_nNumberOfSegmentsStrat;
size_t m_nObjectSize;
};

HRESULT CFastMemoryPool::Initialize(size_t nObjectSize, size_t nNumberOfObjectsInSegment,
size_t nNumberOfSegmentsStrat)
{
SYSTEM_INFO sysinfo;
GetSystemInfo(&sysinfo);
m_nObjectSize = nObjectSize;
m_nBlockSize = MemAlign(nNumberOfObjectsInSegment * nObjectSize + sizeof( PLEX ),
sysinfo.dwAllocationGranularity);
m_nAllocatedSegment = 0;
m_nNumberOfSegmentsStrat = nNumberOfSegmentsStrat;
m_nNumberOfObjectsInSegment = nNumberOfObjectsInSegment;
for (size_t i = 0; i < nNumberOfSegmentsStrat; ++i)
{
AllocBlock();
}
return S_OK;
}

CFastMemoryPool::PPLEX CFastMemoryPool::AllocBlock()
{
PPLEX pBlock = static_cast< PPLEX >( VirtualAlloc(NULL,
m_nBlockSize, MEM_RESERVE|MEM_COMMIT|MEM_TOP_DOWN, PAGE_READWRITE) );
++m_nAllocatedSegment;
pBlock->_next = m_pBlock;
m_pBlock = pBlock;
pBlock->_freeindex = 0;
pBlock->_free = m_pFreeBlock;
m_pFreeBlock = pBlock;
return pBlock;
}

void *CFastMemoryPool::AllocBuffer()
{
while (m_pFreeBlock)
{
if (m_pFreeBlock->_freeindex < m_nNumberOfObjectsInSegment)
{
return ((BYTE *)m_pFreeBlock->data() + m_nObjectSize
* m_pFreeBlock->_freeindex++);
}
m_pFreeBlock = m_pFreeBlock->_free;
}
AllocBlock();
return ((BYTE *)m_pFreeBlock->data() + m_nObjectSize * m_pFreeBlock->_freeindex++);
}

void CFastMemoryPool::Shrink()
{
for (size_t i = m_nNumberOfSegmentsStrat; i < m_nAllocatedSegment; ++i)
{
PPLEX pKill = m_pBlock;
m_pBlock = m_pBlock->_next;
::VirtualFree(pKill, 0, MEM_RELEASE);
-- m_nAllocatedSegment;
}
m_pFreeBlock = NULL;
PPLEX pPlex = m_pBlock;
while (pPlex)
{
pPlex->_freeindex = 0;
pPlex->_free = m_pFreeBlock;
m_pFreeBlock = pPlex;
pPlex=pPlex->_next;
}
}

void CFastMemoryPool::Destroy()
{
PPLEX pPlex = m_pBlock;
while (m_pBlock)
{
PPLEX pKill = m_pBlock;
m_pBlock = m_pBlock->_next;
::VirtualFree(pKill, 0, MEM_RELEASE);
}
m_pFreeBlock = NULL;
m_nAllocatedSegment = 0;
}

谁制造了混乱

网上看到一篇笑话,描述软件开发过程的:

1. 程序员写出自认为没有Bug的代码。
2. 软件测试,发现了20个Bug。
3. 程序员修改了10个Bug,并告诉测试组另外10个不是Bug。
4. 测试组发现其中5个改动根本无法工作,同时又发现了15个新Bug。
5. 重复3次步骤3和步骤4。
6. 鉴于市场方面的压力,为了配合当初制定的过分乐观的发布时间表,产品终于上市了。
7. 用户发现了137个新Bug。
8. 已经领了项目奖金的程序员不知跑到哪里去了。
9. 新组建的项目组修正了差不多全部137个Bug,但又发现了456个新Bug。
10. 最初那个程序员从斐济给饱受拖欠工资之苦的测试组寄来了一张明信片。整个测试
组集体辞职。
11. 公司被竞争对手恶意收购。收购时,软件的最终版本包含783个Bug。
12. 新CEO走马上任。公司雇了一名新程序员重写该软件。
13. 程序员写出自认为没有Bug的代码。

虽然是个笑话,但是这个笑话几乎每天都在发生。由此想起另一个笑话:

晚上,一个建筑师、一个钓鱼的和一个程序员坐在一起聊天,并
开始比较他们各自的职业哪一个更为古老。
“嘿,兄弟们!大家都晓得钓鱼是最古老的职业。”钓鱼的说。
“啊,”建筑师说,“但在你的职业诞生之前,总要有人才行吧。
那么,人类诞生之前,这世界上又有谁呢?”
“你在说什么呀?难道是上帝吗?”钓鱼的说。
“对呀,难道上帝不是这整个宇宙的建筑师吗?”建筑师自鸣得
意地问。
程序员一直在沉默,这时,他突然插话说:“那么,在上帝成为
建筑师之前,这世界上有什么?”
“黑暗和混乱。”钓鱼的说。
“那么,你知道是谁制造了混乱吗?”程序员说。