Lucene range search

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene range search

Hugo <Nabble>
Using the sample lucene database I sent to you in the last days, try a query like this in the web_search page:
"+type:user +user_when_created:[0 TO 100]"
It shows all rows and the values are not in the desired range (see below). Any idea why? How can I get the range to work?

Reply | Threaded
Open this post in threaded view
|

Re: Lucene range search

fschmidt
Administrator
I was surprised that you could sort by this field since it is stored as a long.  In general, Lucene is based on strings.  I always store fields that I want to query on as strings.  You can use this class:

http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/document/DateTools.html
Woe to those who call bad good and good bad -- Isaiah 5:20
Following the Old Testament, not evil modern culture
Reply | Threaded
Open this post in threaded view
|

Re: Lucene range search

fschmidt
Administrator
Maybe I should convert all number values in indexed fields into a lexicographic string.  I am not sure how to do this.

Lucene now claims to support numbers, but this is new/modern, so it will never work right.
Woe to those who call bad good and good bad -- Isaiah 5:20
Following the Old Testament, not evil modern culture
Reply | Threaded
Open this post in threaded view
|

Re: Lucene range search

fschmidt
Administrator
The numeric expression described by the following regex is lexicographically correct.

-?10\^-?\d\d\*\d(\.\d+)?
Woe to those who call bad good and good bad -- Isaiah 5:20
Following the Old Testament, not evil modern culture
Reply | Threaded
Open this post in threaded view
|

Re: Lucene range search

Hugo <Nabble>
The easiest solution for me was to convert the date/time into a string like this:
this.when_created = Time.now() .. ''
Now the range works. You can decide what to do with this thread.
Reply | Threaded
Open this post in threaded view
|

Re: Lucene range search

fschmidt
Administrator
Hugo <Nabble> wrote
The easiest solution for me was to convert the date/time into a string like this:
this.when_created = Time.now() .. ''
Now the range works. You can decide what to do with this thread.
This is a hack, and wrong.  If the number of digits for the time happens to change, the sort order will be wrong.

We should come up with something that makes sense.
Woe to those who call bad good and good bad -- Isaiah 5:20
Following the Old Testament, not evil modern culture
Reply | Threaded
Open this post in threaded view
|

Re: Lucene range search

Hugo <Nabble>
Yes, I agree this is wrong (although it is okay for the photos project for now). Sorting documents in memory isn't good, Lucene has to do that for us. You can propose something, but this should be easy and intuitive.
Reply | Threaded
Open this post in threaded view
|

Re: Lucene range search

fschmidt
Administrator
I have a clear idea of what I want to do.  You talk to me on skype if you want to discuss it, but I will go ahead.
Woe to those who call bad good and good bad -- Isaiah 5:20
Following the Old Testament, not evil modern culture
Reply | Threaded
Open this post in threaded view
|

Re: Lucene range search

fschmidt
Administrator
In reply to this post by Hugo <Nabble>
I looked at Lucene again and decided that numeric ranges aren't so bad after all.  So I added Lucene.query.range which you can use like this:
query.range("user_when_created",assert_long(0),assert_long(100))
The query parser doesn't support numeric fields, but I think I can modify it to do this.  I would identify any value starting with "#" as a number.  If the number ends with "L", it would be long.  If it contains a "." then it is double.  Else it is integer.  Then your query would be:
"+type:user +user_when_created:[#0L TO #100L]"
If this make sense, I will implement it.
Woe to those who call bad good and good bad -- Isaiah 5:20
Following the Old Testament, not evil modern culture
Reply | Threaded
Open this post in threaded view
|

Re: Lucene range search

Hugo <Nabble>
Ideally it should detect the type of the field and build the correct range. If this is too hard, then your idea is fine.
Reply | Threaded
Open this post in threaded view
|

Re: Lucene range search

fschmidt
Administrator
Hugo <Nabble> wrote
Ideally it should detect the type of the field and build the correct range. If this is too hard, then your idea is fine.
The same field can contain different types in different records, so I don't see how this is possible.
Woe to those who call bad good and good bad -- Isaiah 5:20
Following the Old Testament, not evil modern culture
Reply | Threaded
Open this post in threaded view
|

Re: Lucene range search

Hugo <Nabble>
Yes, I realized that later. But this can also happen with your idea. What if the search finds a record that isn't a number? Will it ignore the record or throw an error?
Reply | Threaded
Open this post in threaded view
|

Re: Lucene range search

fschmidt
Administrator
It will ignore the record.

So should I make the query parser changes?
Woe to those who call bad good and good bad -- Isaiah 5:20
Following the Old Testament, not evil modern culture
Reply | Threaded
Open this post in threaded view
|

Re: Lucene range search

Hugo <Nabble>
Yes.
Reply | Threaded
Open this post in threaded view
|

Re: Lucene range search

fschmidt
Administrator
In reply to this post by Hugo <Nabble>
You should be able to make this work now with numbers.  Let me know if you need help.
Woe to those who call bad good and good bad -- Isaiah 5:20
Following the Old Testament, not evil modern culture
Reply | Threaded
Open this post in threaded view
|

Re: Lucene range search

Hugo <Nabble>
I already changed the definition of the "when_created" field to LONG like this:
db.indexed_fields.user_when_created = Lucene.type.long
I also changed the code to save the value as a long. I got no errors and the sorting is working fine. Do I have to create a db step to fix old documents? If yes, please teach me how.
Reply | Threaded
Open this post in threaded view
|

Re: Lucene range search

fschmidt
Administrator
If it works, then you may not need anything.  If there is a problem, I would add a step like this:
	Versioning.update(db, {
		[1] = Versioning.a_big_step;
		[2] = function()
			db.advanced_search( "type:user", function(_,doc_fn)
				local doc = doc_fn()
				if Luan.type(doc.user_when_created) == "string" then
					doc.user_when_created = Number.long(String.to_number(doc.user_when_created))
					db.save(doc)
				end
			end )
		end;
	}, 2)
Woe to those who call bad good and good bad -- Isaiah 5:20
Following the Old Testament, not evil modern culture
Reply | Threaded
Open this post in threaded view
|

Re: Lucene range search

Hugo <Nabble>
Thanks, the step worked and everything is working fine now.

I am closing this thread.