%Startswith or LIKE?

Solapas principales

Could anyone please explain why Caché has a %STARTSWITH function while it supports ANSI SQL "LIKE"?

 

thank you very much.

 

 

  • + 1
  • 0
  • 320
  • 24

Respuestas

%STARTSWITH relates better to Caché internal structures and is faster in larger scale

%STARTSWITH is not faster or slower when comparing apples to apples.

LIKE can find a substring wherever it occurs, and has multi-character and single-character wildcards. %STARTSWITH is looking only at the beginning of the string, so it's equivalent to LIKE 'ABC%'.

Updating to match another updated post lower on this page. If the comparison string is parameterized, LIKE sometimes does an extra check, so %STARTSWITH will be slightly faster.

When the comparison string ('ABC%' and 'ABC') is fixed. The code that checks LIKE 'ABC%' is exactly the same as the code that checks %STARTSWITH 'ABC'

I was told that before but I never know it written in document. Thanks. 

I did a little bit more research.

  • Maybe %STARTSWITH 'abc' was at one time faster than the equivalent predicate LIKE 'abc%'.
  • The quote comes from the FOR SOME %ELEMENT predicate documentation. This predicate can be used with Collections and an old feature called Free Text Search. The quote was actually only meant to apply to the Free Text Search usage.
  • I've tested %STARTSWITH 'abc' and LIKE 'abc%' today using FOR SOME %ELEMENT with Collections and Free Text Search. The code is identical.

Conclusion: the quote will be removed from the documentation since it's no longer true.

Thanks, @Vitaliy Serdtsev, for making me realize that I should have been testing with placeholders rather than fixed values to the right of %STARTSWITH or LIKE. I was testing with Embedded SQL; with fixed values, my earlier statements are true. But if the query itself uses placeholders (? or host variables), or the WHERE clause is parameterized automatically (thanks, @Eduard Lebedyuk, for mentioning that) then the generated code differs, and LIKE sometimes does do an extra (slightly slower) comparison, because at runtime, LIKE could get a simple pattern ("abc%") or a complex one ("a_b%g_i") and the code has to cope with those possibilities.

New conclusion: the quote will be clarified so that it mentions placeholders/paramaterization and moved to the %STARTSWITH and LIKE documentation, instead of being buried in FOR SOME %ELEMENT.

And thanks to @Hao Ma for bringing this up!

Conclusion: the quote will be removed from the documentation since it's no longer true.

Then besides this, in the documentation for %STARTSWITH need to add the note DEPRECATED and the recommendation "use LIKE 'XXX%'"

I also did an analysis for Caché 2018.1

Class del.t Extends %Persistent
{

Index ip On p;

Property As %VarString;

/// d ##class(del.t).Fill()
ClassMethod Fill(1000000)
{
  DISABLE^%NOJRN
  ^del.tD,^del.tI

  i=1:1:^del.tD(i)=$lb("","test"_i)
  ^del.tD=N
  ENABLE^%NOJRN

  ..%BuildIndices(,,,$$$NO)
  d $system.SQL.TuneTable($classname(),$$$YES)
  d $system.OBJ.Compile($classname(),"cu-d")
}
}

Although the plans are exactly the same in SMP the results of the metrics differ:

select count(*from del.where like 'test7%'
Row count: 1 Performance: 0.291 seconds 333340 global references 2000537 lines executed

select count(*from del.where %startswith 'test7'
Row count: 1 Performance: 0.215 seconds 333340 global references 1889349 lines executed

For the next two queries, the INT code matches:

&sql(select * from del.where like 'test7%')
&sql(select * from del.where %startswith 'test7')

But for these - is already different, so the metrics in SMP are different:

&sql(select * from del.where like :a)
&sql(select * from del.where %startswith :a)

In IRIS 2020.1, the work with embedded queries was changed, but I can't check it.

Hi Evgeny!

I can't check for technical reasons. Docker version does not suit me.

Understand. Just curious - does Try IRIS work for you for testing purposes? Or are there any issues with Try IRIS? I mean we could fix it if any.

Then besides this, in the documentation for %STARTSWITH need to add the note DEPRECATED and the recommendation "use LIKE 'XXX%'"

select count(*from del.where like 'test7%'
Row count: 1 Performance: 0.291 seconds 333340 global references 2000537 lines executed

select count(*from del.where %startswith 'test7'
Row count: 1 Performance: 0.215 seconds 333340 global references 1889349 lines executed

I'm not sure what you mean here. %STARTSWITH executed fewer lines so why would we recommend LIKE instead?

Forget.
Now %STARTSWITH all other things being equal is slightly faster than LIKE. This point is deeply hidden in the documentation, and it seems that this applies only to FOR SOME %ELEMENT.
If manage to speed up the special case for LIKE, then still need to correct/supplement the documentation.

I have never heard of anyone issuing the blanket statement that InterSystems predicates are faster or slower than the ANSI standard ones. I don't think there are that many predicates that have similar functionality. As I said in a different comment, %STARTWITH 'abc' is 100% equivalent to LIKE 'abc%'. InterSystems also provides %MATCHES and %PATTERN, but they are different.

Than you all for your replies. I heard the %STARTSWITH has better performance but I never know it is in the online document. However, I am a little confused: instead of recommending users to use something with which they are not familiar, why not make 'LIKE' faster?

I'm surprised you don't see the obvious performance difference of looking for something of distinct length at the beginning of a string
vs. scanning an eventual rather long string for some bytes somewhere eventually including also composed strings as %AB%CD%.

keep in mind: Caché is built for speed, not for the comfort of the average programmer 

I think that the author meant that the simplest queries of the form
like 'text%'
automatically worked as/converted to
%startswith 'text'

To answer that, before query is compiled all arguments are parametrized:

like 'text%'

becomes

like ?

so we can't really replace LIKE with %STARTSWITH on code generation step (there's a brackets argument specification I suppose).

Thanks, Robert!

> keep in mind: Caché is built for speed, not for the comfort of the average programmer  

We work hard to make IRIS not only fast but comfortable too for any backend, full-stack, AI developer. Pinging @Raj Singh , our Product Manager on Developer Experience.


 

I think the good sense is the key,

Today's compiler's optimizers detect the most of common expressions and generate the better performatic code, but we can't be obsessed with readability.

A good post: Performance vs Readability

Sure, Andre!

What's wrong with %CONTAINS?

I think it deserves a separate question/discussion

Quote from the documentation:

For performance reasons, the predicate %STARTSWITH 'abc' is preferable to the equivalent predicate LIKE 'abc%'. proof