środa, 20 czerwca 2012

Sphinx part 2 – how the search engine works

Ok in this part we gonna use this simple table:
create table articles( 
id_article int unsigned not null primary key auto_increment, 
title varchar(64) not null, 
content text not null, 
ratings int unsigned 
);

insert into articles set title='Euro 2012 in Poland',content='some content about euro 2012',ratings=5; 
insert into articles set title='Second article',content='bla dog cat house',ratings=2; 
insert into articles set title='Euro 2012 last results',content='Poland loses to czech republic',ratings=4; 
insert into articles set title='Fourth article',content='lorem ipsum black yellow red blue green',ratings=1; 
Ok now the config file (I'm gonna use the very basic configuration) By default the deamon is running, so we have to stop it
sudo /etc/init.d/sphinxsearch stop 
Ok so first of all we to index our data sudo indexer -c /etc/sphinxsearch/articles.conf index_articles
Sphinx 2.0.4-id64-release (r3135) 
Copyright (c) 2001-2012, Andrew Aksyonoff 
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com) 

using config file '/etc/sphinxsearch/articles.conf'... 
indexing index 'index_articles'... 
WARNING: Attribute count is 0: switching to none docinfo 
collected 4 docs, 0.0 MB 
sorted 0.0 Mhits, 100.0% done 
total 4 docs, 187 bytes 
total 0.008 sec, 21623 bytes/sec, 462.53 docs/sec 
total 2 reads, 0.000 sec, 0.2 kb/call avg, 0.0 msec/call avg 
total 6 writ
es, 0.000 sec, 0.2 kb/call avg, 0.0 msec/call avg 
We see that 4 records have been indexed. So now let's do the basic search : We want to find all artciles that contains word 'euro'
search -c /etc/sphinxsearch/articles.conf euro
using config file '/etc/sphinxsearch/articles.conf'... 
index 'index_articles': query 'euro ': returned 2 matches of 2 total in 0.000 sec 

displaying matches: 
1. document=1, weight=2578 
2. document=3, weight=1557 
We see that we found two records with id 1 and 3. Ok now let's search the word house
search -c /etc/sphinxsearch/articles.conf house
displaying matches: 
1. document=2, weight=1695 
so that was easy . No we gonna modify out configuration file a little bit: Next thing is filtering by attributes. To our source section in config file we add sql_attr_uint attribute
 source src_articles
{
        type            = mysql
        sql_host        = localhost
        sql_user        = root
        sql_pass        = krzysiek2000
        sql_db          = test
        sql_attr_uint = ratings

        sql_query =  select id_article,title,content,ratings from test.articles;
        sql_query_info = select id_article,title,content,ratings from test.articles where id_article = $id;
}
search -c /etc/sphinxsearch/articles.conf -f ratings 5 euro
displaying matches: 
1. document=1, weight=2578, ratings=5 
 id_article=1 
 title=Euro 2012 in Poland 
 content=some content about euro 2012 
 ratings=5 
We perform a query that returns rows containg word 'euro' where rating equals 5. In next , we gonna perform similar searches with php api.

poniedziałek, 11 czerwca 2012

Sphinx part 1 - basic installation and config

Long time no see ;) I was quite busy at work and in free time I learn about one awsome thing called sphinx search server http://sphinxsearch.com/
Starting from now , next few post gonna by about sphinx.It's pretty awesome full-text search engine.
I'm only gonna said that on table containing ~40 milion record (with partitions, right indexing and so on) classic mysql full-text search last about 10-15s. When I do this with sphinx it's less than 1s :)
So let's get started ;)
At first I'm gonna show how to install sphinx on Ubuntu/Debian and check is it running properly.


1. Installation:

apt-get install sphinxsearch

and that's why I love ubuntu : D


2. Basic elements
Ok, so before we start the party, few words to remember

indexer – a tool for indexing our data sources
searchd - deamon responsible for searching data
search – a command line tool for searching data
searchapi – api for programming languages (in this tutorial we gonna focus on php)

Ok so when we're ready let's see the config file
vim /etc/sphinxsearch/sphinx.conf

 source test_src
{ 
    type            = mysql 
    sql_host        = localhost 
    sql_user        = myuser 
    sql_pass        = mypass 
    sql_db          = database_name 

    sql_query =  select id,name from history

} 


index index_test
{ 
    source              = test_src
    path                = /var/lib/sphinxsearch/data/index_test
    docinfo             = extern 
    charset_type        = utf-8 
} 

indexer 
{ 
    mem_limit       = 32M 
} 


searchd 
{ 
        port = 3312 
        log  = /var/log/searchd/searchd.log 
        query_log = /var/log/searchd/query.log 
        pid_file = /var/log/searchd/searchd.pid 
} 
The typical config contains four parts (search,search,index,indexer).
The source,index and indexer are required.
Fileds in source and index par are quite intuitive and should not cause problems.
sql_query defines the query that will give data to our source
docinfo defines how exactly docinfo will be physically stored on disk and RAM
In next part I will show how to prepare basic config and start sphinx deamon.