|
|
cts:distinctive-terms(
|
|
$nodes as node()*,
|
|
[$options as element()?]
|
| ) as element(cts:class) |
|
 |
Summary:
Return the most "relevant" terms in the model nodes (that is, the
terms with the highest scores).
|
Parameters:
$nodes
:
Some model nodes.
|
$options
(optional):
An XML representation of the options for defining which terms to
generate and how to evaluate them.
The options node must be in the cts:distinctive-terms
namespace. The following is a sample options node:
<options xmlns="cts:distinctive-terms">
<max-terms>20</max-terms>
</options>
The
cts:distinctive-terms options (which are also valid for
cts:similar-query, cts:train,
and cts:cluster)
include:
<max-terms>
- An integer defining the maximum number of distinctive terms to list
in the
cts:distinctive-terms output. The default is 16.
<min-val>
- A double specifying the minimum value a term can
have and still be considered a distinctive term. The default is 0.
<min-weight>
- A number specifying the minimum weighted term frequency a term can
have and still be considered a distinctive term. In general this value
will be either 0 (include unweighted terms) or 1 (don't include unweighted
terms). The default is 1.
<score>
- A string defining which scoring method to use in comparing the values
of the terms.
The default is
logtfidf. See the description of scoring
methods in the cts:search function for more details.
Possible values are:
logtfidf
- Compute scores using the logtfidf method.
logtf
- Compute scores using the logtf method.
simple
- Compute scores using the simple method.
<use-db-config>
- A boolean value indicating whether to use the current DB configuration
for determining which terms to use. The default is
true.
Setting the value to false means that the indexing
options in the options node will be used, as well as the default value
for any of the options not specified. This may be used to easily
target a small set of terms.
<complete>
- A boolean value indicating whether to return terms even if there is no
query associated with them. The default is false.
The options element also includes indexing options in the
http://marklogic.com/xdmp/database namespace.
These control which terms to use.
These database options include the following (shown here with
a db prefix to denote the
http://marklogic.com/xdmp/database namespace. The default
given below is the default value if use-db-config is set
to false:
<db:word-searches>
- Include terms for the words in the node. The default is 'false'.
<db:stemmed-searches>
- Define whether to include terms for the stems in the node, and at
what level of stemming:
off, basic,
advanced, or decompounding. The default is 'basic'.
<db:fast-case-sensitive-searches>
- Include terms for case-sensitive variations of the words in the
node. The default is 'false'.
<db:fast-diacritic-sensitive-searches>
- Include terms for diacritic-sensitive variations of the words in
the node. The default is 'false'.
<db:fast-phrase-searches> - Include
terms for two-word phrases in the node. The default is 'true'.
<db:phrase-throughs> - If phrase
terms are included, include terms for phrases that cross the given
elements. The default is to have no such elements.
<db:phrase-arounds> - If phrase
terms are included, include terms for phrases that skip over the
given elements. The default is to have no such elements.
<db:fast-element-word-searches>
- Include terms for words in particular elements. The default is 'true'.
<db:fast-element-phrase-searches>
- Include terms for phrases in particular elements. The default is 'true'.
<db:element-word-query-throughs>
- Include terms for words in sub-elements of the given elements. The default is to have no such elements.
<db:fast-element-character-searches>
- Include terms for characters in particular elements. The default is 'false'.
<db:range-element-indexes>
- Include terms for data values in specific elements. The default is to have no such indexes.
<db:range-field-indexes>
- Include terms for data values in specific fields. The default is to have no such indexes.
<db:range-element-attribute-indexes>
- Include terms for data values in specific attributes. The default is to have no such indexes.
<db:one-character-searches>
- Include terms for single character. The default is 'false'.
<db:two-character-searches>
- Include terms for two-character sequences. The default is 'false'.
<db:three-character-searches>
- Include terms three-character sequences. The default is 'false'.
<db:trailing-wildcard-searches>
- Include terms for trailing wildcards. The default is 'false'.
<db:fast-element-trailing-wildcard-searches>
- If trailing wildcard terms are included, include terms for
trailing wildcards by element. The default is 'false'.
<db:fields>
- Include terms for the defined fields. The default is to have no fields.
|
|
Usage Notes:
Output Format
The output of the function is a cts:class element containing a
sequence of cts:term elements. (This is the same as the weights
form of a class for the SVM classifier; see cts:train.) Each
cts:term element identifies the term ID as well as a score,
confidence, and fitness measure for the term, in addition to a
cts:query that corresponds to the term. The correspondence of
terms to queries is not precise: queries typically make use of multiple
terms, and not all terms correspond to a query. However, a search using the
query given for a term will match the model node that gave rise to it.
|
Example:
cts:distinctive-terms( fn:doc("book.xml"),
<options xmlns="cts:distinctive-terms"><max-terms>3</max-terms></options> )
== >
<cts:class name="dterms book.xml" offset="0" xmlns:cts="http://marklogic.com/cts">
<cts:term id="1230725848944963443" val="482" score="372" confidence="0.686441" fitness="0.781011">
<cts:element-word-query>
<cts:element>title</cts:element>
<cts:text xml:lang="en">the</cts:text>
<cts:option>case-insensitive</cts:option>
<cts:option>diacritic-insensitive</cts:option>
<cts:option>stemmed</cts:option>
<cts:option>unwildcarded</cts:option>
</cts:element-word-query>
</cts:term>
<cts:term id="2859044029148442125" val="435" socre="662" confidence="0.922555" fitness="0.971371">
<cts:word-query>
<cts:text xml:lang="en">text</cts:text>
<cts:option>case-insensitive</cts:option>
<cts:option>diacritic-insensitive</cts:option>
<cts:option>stemmed</cts:option>
<cts:option>unwildcarded</cts:option>
</cts:word-query>
</cts:term>
<cts:term id="17835615465481541363" val="221" score="237" confidence="0.65647" fitness="0.781263">
<cts:word-query>
<cts:text xml:lang="en">of</cts:text>
<cts:option>case-insensitive</cts:option>
<cts:option>diacritic-insensitive</cts:option>
<cts:option>stemmed</cts:option>
<cts:option>unwildcarded</cts:option>
</cts:word-query>
</cts:term>
</cts:class>
|
Example:
cts:distinctive-terms(//title,
<options xmlns="cts:distinctive-terms">
<use-db-config>true</use-db-config>
</options>)
=> a cts:class element contianing the 16 most distinctive query terms
|
Example:
cts:distinctive-terms(<foo>hello there you</foo>,
<options xmlns="cts:distinctive-terms"
xmlns:db="http://marklogic.com/xdmp/database">
<db:word-positions>true</db:word-positions>
</options>)
=> a cts:class element contianing the 16 most distinctive query terms
|
|
|
|
cts:entity-highlight(
|
|
$node as node(),
|
|
$expr as item()*
|
| ) as node() |
|
 |
Summary:
Returns a copy of the node, replacing any entities found
with the specified expression. You can use this function
to easily highlight any entities in an XML document in an arbitrary manner.
If you do not need fine-grained control of the XML markup returned,
you can use the entity:enrich XQuery module function instead.
A valid entity enrichment license key is required
to use cts:entity-highlight;
without a valid license key, it throws an exception. If you
have a valid license for entity enrichment, you can entity enrich text
in English and in any other languages for which you have a valid license
key. For languages in which you do not have a valid license key,
cts:entity-highlight finds no entities for text in that
language.
|
Parameters:
$node
:
A node to run entity highlight on. The node must be either a document node
or an element node; it cannot be a text node.
|
$expr
:
An expression with which to replace each match. You can use the
variables $cts:text, $cts:node,
$cts:entity-type and $cts:normalized-text,
$cts:start, and $cts:action
(described below) in the expression.
|
|
Usage Notes:
In addition to a valid Entity Enrichment license key, this function
requires that you have installed the Entity Enrichment package. For
details on installing the Entity Enrichment package, see the
Installation Guide and the "Marking Up Documents With
Entity Enrichment" chapter of the Search Developer's Guide.
There are six built-in variables to represent an entity match.
These variables can be used inline in the expression parameter.
$cts:text as xs:string
The matched text.
$cts:node as text()
The node containing the matched text.
$cts:start as xs:integer
The string-length position of the first character of
$cts:text in $cts:node. Therefore, the following
always returns true:
fn:substring($cts:node, $cts:start,
fn:string-length($cts:text)) eq $cts:text
$cts:action as xs:string
Use xdmp:set on this to specify what should happen
next
- "continue"
- (default) Walk the next match.
If there are no more matches, return all evaluation results.
- "skip"
- Skip walking any more matches and return all evaluation results.
- "break"
- Stop walking matches and return all evaluation results.
$cts:entity-type as xs:string
The type of the matching entity.
$cts:normalized-text as xs:string
The normalized entity text (only applicable for some
languages).
The following are the entity types returned from the
$cts:entity-type built-in variable (in alphabetical order):
FACILITY
- A place used as a facility.
GPE
- Geo-political entity. Differs from location because it has a
person-made aspect to it (for example, California is a GPE because
its boundaries were defined by a government).
IDENTIFIER:CREDIT_CARD_NUM
- A number identifying a credit card number.
IDENTIFIER:DISTANCE
- A number identifying a distance.
IDENTIFIER:EMAIL
- Identifies an email address.
IDENTIFIER:LATITUDE_LONGITUDE
- Latitude and longitude coordinates.
IDENTIFIER:MONEY
- Identifies currency (dollars, euros, and so on).
IDENTIFIER:NUMBER
- Identifies a number.
IDENTIFIER:PERSONAL_ID_NUM
- A number identifying a social security number or other ID
number.
IDENTIFIER:PHONE_NUMBER
- A number identifying a telephone number.
IDENTIFIER:URL
- Identifies a web site address (URL).
IDENTIFIER:UTM
- Identifies Universal Transverse Mercator coordinates.
LOCATION
- A geographic location (Mount Everest, for example).
NATIONALITY
- The nationality of someone or something (for example, American).
ORGANIZATION
- An organization.
PERSON
- A person.
RELIGION
- A religion.
TEMPORAL:DATE
- Date-related.
TEMPORAL:TIME
- Time-related.
TITLE
- Appellation or honorific associated with a person.
URL
- A URL on the world wide web.
UTM
- A point in the Universal Transverse Mercator (UTM)
coordinate system.
|
Example:
let $myxml := <node>George Washington never visited Norway.
If he had a Social Security number,
it might be 000-00-0001.</node>
return
cts:entity-highlight($myxml,
element { fn:replace($cts:entity-type, ":", "-") } { $cts:text })
=>
<node>
<PERSON>George Washington</PERSON> never visited <GPE>Norway</GPE>.
If he had a Social Security number, it might be
<IDENTIFIER-PERSONAL_ID_NUM>000-00-0001</IDENTIFIER-PERSONAL_ID_NUM>.
</node>
|
|
|
|
cts:highlight(
|
|
$node as node(),
|
|
$query as cts:query,
|
|
$expr as item()*
|
| ) as node() |
|
 |
Summary:
Returns a copy of the node, replacing any text matching the query
with the specified expression. You can use this function
to easily highlight any text found in a query. Unlike
fn:replace and other XQuery string functions that match
literal text, cts:highlight matches every term that
matches the search, including stemmed matches or matches with
different capitalization.
|
Parameters:
$node
:
A node to highlight. The node must be either a document node
or an element node; it cannot be a text node.
|
$query
:
A query specifying the text to highlight. If a string
is entered, the string is treated as a cts:word-query of the
specified string.
|
$expr
:
An expression with which to replace each match. You can use the
variables $cts:text, $cts:node,
$cts:queries, $cts:start, and
$cts:action (described below) in the expression.
|
|
Usage Notes:
There are five built-in variables to represent a query match.
These variables can be used inline in the expression parameter.
$cts:text as xs:string
The matched text.
$cts:node as text()
The node containing the matched text.
$cts:queries as cts:query*
The matching queries.
$cts:start as xs:integer
The string-length position of the first character of
$cts:text in $cts:node. Therefore, the following
always returns true:
fn:substring($cts:node, $cts:start,
fn:string-length($cts:text)) eq $cts:text
$cts:action as xs:string
Use xdmp:set on this to specify what should happen
next
- "continue"
- (default) Walk the next match.
If there are no more matches, return all evaluation results.
- "skip"
- Skip walking any more matches and return all evaluation results.
- "break"
- Stop walking matches and return all evaluation results.
You cannot use cts:highlight to highlight results matching
cts:similar-query and cts:element-attribute-*-query
items. Using cts:highlight with these queries will
return the nodes without any highlighting.
You can also use cts:highlight as a general search
and replace function. The specified expression will replace any matching
text. For example, you could replace the word "hello" with "goodbye"
in a query similar to the following:
cts:highlight($node, "hello", "goodbye")
Because the expressions can be any XQuery expression, they can be very
simple like the above example or they can be extremely complex.
|
Example:
To highlight "MarkLogic" with bold in the following paragraph:
let $x := <p>MarkLogic Server is an enterprise-class
database specifically built for content.</p>
return
cts:highlight($x, "MarkLogic", <b>{$cts:text}</b>)
Returns:
<p><b>MarkLogic</b> Server is an enterprise-class
database specifically built for content.</p>
|
Example:
Given the following document with the URI "hellogoodbye.xml":
<root>
<a>It starts with hello and ends with goodbye.</a>
</root>
The following query will highlight the word "hello" in
blue, and everything else in red.
cts:highlight(doc("hellogoodbye.xml"),
cts:and-query((cts:word-query("hello"),
cts:word-query("goodbye"))),
if (cts:word-query-text($cts:queries) eq "hello")
then (<font color="blue">{$cts:text}</font>)
else (<font color="red">{$cts:text}</font>))
returns:
<root>
<a>It starts with <font color="blue">hello</font>
and ends with <font color="red">goodbye</font>.</a>
</root>
|
Example:
for $x in cts:search(collection(), "MarkLogic")
return
cts:highlight($x, "MarkLogic", <b>{$cts:text}</b>)
returns all of the nodes that contain "MarkLogic",
placing bold markup around the matched words.
|
|
|
|
cts:remainder(
|
|
[$node as node()]
|
| ) as xs:integer |
|
 |
Summary:
Returns an estimated search result size for a node,
or of the context node if no node is provided.
The search result size for a node is the number of fragments remaining
(including the current node) in the result sequence containing the node.
This is useful to quickly estimate the size of a search result sequence,
without using fn:count() or xdmp:estimate().
|
Parameters:
$node
(optional):
A node. Typically this is an item in the result sequence of a
cts:search operation. If you specify the first item
from a cts:search expression,
then cts:remainder will return an estimate of the number
of fragments that match that expression.
|
|
Usage Notes:
This function makes it efficient to estimate the size of a search result
and execute that search in the same query. If you only need an estimate of
the size of a search but do not need to run the search, then
xdmp:estimate is more efficient.
To return the estimated size of a search with cts:remainder,
use the first item of a cts:search result sequence as the
parameter to cts:remainder. For example, the following
query returns the estimated number of fragments that contain the word
"dog":
cts:remainder(cts:search(collection(), "dog")[1])
When you put the position predicate on the cts:search result
sequence, MarkLogic Server will filter all of the false-positive results
up to the specified position, but not the false-positive results beyond
the specified
position. Because of this, when you increase the position number in the
parameter, the result from cts:remainder might decrease
by a larger number than the increase in position number, or it might not
decrease at all. For example, if
the query above returned 10, then the following query might return 9, it
might return 10, or it might return less than 9, depending on how the
results are dispersed throughout different fragments:
cts:remainder(cts:search(collection(), "dog")[2])
If you run cts:remainder on a constructed node, it always
returns 0; it is primarily intended to run on nodes that are the retrieved
from the database (an item from a cts:search result or an
item from the result of an XPath expression that searches through the
database).
|
Example:
let $x := cts:search(collection(), "dog")
return
(cts:remainder($x[1]), $x)
=> Returns the estimated number of items in the search
for "dog" followed by the results of the search.
|
Example:
xdmp:document-insert("/test.xml", <a>my test</a>);
for $x in cts:search(collection(),"my test")
return cts:remainder($x) => 1
|
Example:
for $a in cts:search(collection(),"my test")
where $a[cts:remainder() eq 1]
return xdmp:node-uri($a) => /test.xml
|
|
|
|
cts:search(
|
|
$expression as node()*,
|
|
$query as cts:query?,
|
|
[$options as xs:string*],
|
|
[$quality-weight as xs:double?],
|
|
[$forest-ids as xs:unsignedLong*]
|
| ) as node()* |
|
 |
Summary:
Returns a relevance-ordered sequence of nodes specified by a given query.
|
Parameters:
$expression
:
An expression to be searched.
This must be an inline fully searchable path expression.
|
$query
:
A cts:query specifying the search to perform. If a string
is entered, the string is treated as a cts:word-query of the
specified string.
|
$options
(optional):
Options to this search. The default is ().
Options include:
"filtered"
A filtered search (the default). Filtered searches
eliminate any false-positive matches and properly resolve cases where
there are multiple candidate matches within the same fragment.
Filtered search results fully satisfy the specified
cts:query.
"unfiltered"
An unfiltered search. An unfiltered search
selects fragments from the indexes that are candidates to satisfy
the specified cts:query, and then it returns
a single node from within each fragment that satisfies the specified
searchable path expression. Unfiltered searches are useful because
of the performance they afford when jumping deep into the
result set (for example, when paginating a long result set and
jumping to the 1,000,000th result). However, depending on the
searchable path expression, the
cts:query specified, the structure of the documents in
the database, and the configuration of the database, unfiltered
searches may yield false-positive results being included in the
search results. Unfiltered searches may also result in missed
matches or in incorrect matches, especially when there are
multiple candidate matches within a single fragment.
To avoid these problems, you should only use unfiltered searches
on top-level XPath expressions (for example, document nodes,
collections, directories) or on fragment roots. Using unfiltered
searches on complex XPath expressions or on XPath expressions that
traverse below a fragment root can result in unexpected results.
"score-logtfidf"
Compute scores using the logtfidf method (the default scoring
method). This uses the formula:
log(term frequency) * (inverse document frequency)
"score-logtf"
Compute scores using the logtf method. This does not take into
account how many documents have the term and uses the formula:
log(term frequency)
"score-simple"
Compute scores using the simple method. The score-simple
method gives a score of 8*weight for each matching term in the
cts:query expression. It does not matter how
many times a given term matches (that is, the term
frequency does not matter); each match contributes 8*weight
to the score. For example, the following query (assume the
default weight of 1) would give a score of 8 for
any fragment with one or more matches for "hello", a score of 16
for any fragment that also has one or more matches for "goodbye",
or a score of zero for fragments that have no matches for
either term:
cts:or-query(("hello", "goodbye"))
"score-random"
Compute scores using the random method. The score-random
method gives a random value to the score. You can use this
to randomly choose fragments matching a query.
- "checked"
Word positions are checked (the default) when resolving
the query. Checked searches eliminate false-positive matches for
phrases during the index resolution phase of search processing.
- "unchecked"
Word positions are not checked when resolving the
query. Unchecked searches do not take into account word positions
and can lead to false-positive matches during the index resolution
phase of search processing. This setting is useful
for debugging, but not recommended for normal use.
|
$quality-weight
(optional):
A document quality weight to use when computing scores.
The default is 1.0.
|
$forest-ids
(optional):
A sequence of IDs of forests to which the search will be constrained.
An empty sequence means to search all forests in the database.
The default is (). You can use cts:search with this
parameter and an empty cts:and-query to specify a
forest-specific XPath statement (see the third
example below). If you
use this to constrain an XPath to one or more forests, you should set
the quality-weight to zero to keep the XPath document
order.
|
|
Usage Notes:
Queries that use cts:search require that the XPath expression
searched is fully searchable. A fully searchable path is one that
has no steps that are unsearchable and whose last step is searchable.
You can use the
xdmp:query-trace() function to see if the path is fully
searchable. If there are no entries in the xdmp:query-trace()
output indicating that a step is unsearchable, and if the last step
is searchable, then that path is fully
searchable. Queries that use cts:search on unsearchable
XPath expressions will fail with an an error message. You can often make
the path expressions fully searchable by rewriting the query or adding
new indexes.
Each node that cts:search returns has a score with which
it is associated. To access the score, use the cts:score
function. The nodes are returned in relevance order (most relevant to least
relevant), where more relevant nodes have a higher score.
Only one of the "filtered" or "unfiltered" options may be specified
in the options parameter. If neither "filtered" nor "unfiltered", is
specified then the default is "filtered".
Only one of the "score-logtfidf", "score-logtf", "score-simple",
or "score-random" options may be specified in the options parameter.
If none of "score-logtfidf", "score-logtf", "score-simple", or
"score-random" are specified, then the default is "score-logtfidf".
Only one of the "checked" or "unchecked" options may be specified
in the options parameter. If the neither "checked" nor "unchecked" are
specified, then the default is "checked".
If the cts:query specified is the empty string (equivalent
to cts:word-query("")), then the search returns the empty
sequence.
|
Example:
cts:search(//SPEECH,
cts:word-query("with flowers"))
=> ... a sequence of 'SPEECH' element ancestors (or self)
of any node containing the phrase 'with flowers'.
|
Example:
cts:search(collection("self-help")/book,
cts:element-query(xs:QName("title"), "meditation"),
"score-simple", 1.0, (xdmp:forest("prod"),xdmp:forest("preview")))
=> ... a sequence of book elements matching the XPath
expression which are members of the "self-help"
collection, reside in the the "prod" or "preview" forests and
contain "meditation" in the title element, using the
"score-simple" option.
|
Example:
cts:search(/some/xpath, cts:and-query(()), (), 0.0,
xdmp:forest("myForest"))
=> ... a sequence of /some/xpath elements that are
in the forest named "myForest". Note the
empty and-query, which matches all documents (and
scores them all the same) and the quality-weight
of 0, which together make each result have a score
of 0, which keeps the results in document order.
|
|
|
|
cts:tokenize(
|
|
$text as xs:string,
|
|
[$language as xs:string?]
|
| ) as cts:token* |
|
 |
Summary:
Tokenizes text into words, punctuation, and spaces. Returns output in
the type cts:token, which has subtypes
cts:word, cts:punctuation, and
cts:space, all of which are subtypes of
xs:string.
|
Parameters:
$text
:
A word or phrase to tokenize.
|
$language
(optional):
A language to use for tokenization. If not supplied, it uses the
database default language.
|
|
Usage Notes:
When you tokenize a string with cts:tokenize, each word is
represented by an instance of
cts:word, each punctuation character
is represented by an instance of cts:punctuation,
each set of adjacent spaces is represented by an instance of
cts:space, and each set of adjacent line breaks
is represented by an instance of cts:space.
Unlike the standard XQuery function fn:tokenize,
cts:tokenize returns words, punctuation, and spaces
as different types. You can therefore use a typeswitch to handle each type
differently. For example, you can use cts:tokenize to remove
all punctuation from a string, or create logic to test for the type and
return different things for different types, as shown in the first
two examples below.
You can use xdmp:describe to show how a given string will be
tokenized. When run on the results of cts:tokenize, the
xdmp:describe function returns the types and the values
for each token. For a sample of this pattern, see the third example below.
|
Example:
(: Remove all punctuation :)
let $string := "The red, blue, green, and orange
balloons were launched!"
let $noPunctuation :=
for $token in cts:tokenize($string)
return
typeswitch ($token)
case $token as cts:punctuation return ""
case $token as cts:word return $token
case $token as cts:space return $token
default return ()
return string-join($noPunctuation, "")
=> The red blue green and orange
balloons were launched
|
Example:
(: Insert the string "XX" before and after
all punctuation tokens :)
let $string := "The red, blue, green, and orange
balloons were launched!"
let $tokens := cts:tokenize($string)
return string-join(
for $x in $tokens
return if ($x instance of cts:punctuation)
then (concat("XX",
$x, "XX"))
else ($x) , "")
=> The redXX,XX blueXX,XX greenXX,XX and orange
balloons were launchedXX!XX
|
Example:
(: show the types and tokens for a string :)
xdmp:describe(cts:tokenize("blue, green"))
=> (cts:word("blue"), cts:punctuation(","),
cts:space(" "), cts:word("green"))
|
|
|
|
cts:walk(
|
|
$node as node(),
|
|
$query as cts:query,
|
|
$expr as item()*
|
| ) as item()* |
|
 |
Summary:
Walks a node, evaluating an expression with any text matching a query.
It returns a sequence of all the values returned by the expression
evaluations. This is similar to cts:highlight in how it
evaluates its expression, but it is different in what it returns.
|
Parameters:
$node
:
A node to walk. The node must be either a document node
or an element node; it cannot be a text node.
|
$query
:
A query specifying the text on which to evaluate the expression.
If a string is entered, the string is treated as a
cts:word-query of the specified string.
|
$expr
:
An expression to evaluate with matching text. You can use the
variables $cts:text, $cts:node,
$cts:queries, $cts:start, and
$cts:action (described below) in the expression.
|
|
Usage Notes:
There are five built-in variables to represent a query match.
These variables can be used inline in the expression parameter.
$cts:text as xs:string
The matched text.
$cts:node as text()
The node containing the matched text.
$cts:queries as cts:query*
The matching queries.
$cts:start as xs:integer
The string-length position of the first character of
$cts:text in $cts:node. Therefore, the following
always returns true:
fn:substring($cts:node, $cts:start,
fn:string-length($cts:text)) eq $cts:text
$cts:action as xs:string
Use xdmp:set on this to specify what should happen
next
- "continue"
- (default) Walk the next match.
If there are no more matches, return all evaluation results.
- "skip"
- Skip walking any more matches and return all evaluation results.
- "break"
- Stop walking matches and return all evaluation results.
You cannot use cts:walk to walk results matching
cts:similar-query and cts:element-attribute-*-query
items.
Because the expressions can be any XQuery expression, they can be very
simple like the above example or they can be extremely complex.
|
Example:
(:
Return all text nodes containing matches to the query "the".
:)
let $x := <p>the quick brown fox <b>jumped</b> over the lazy dog's back</p>
return cts:walk($x, "the", $cts:node)
=>
(text{"the quick brown fox "}, text{" over the lazy dog's back"})
|
Example:
xquery version "1.0-ml";
(:
Do not show any more matches that occur after
$threshold characters.
:)
let $x := <p>This is 1, this is 2, this is 3, this is 4, this is 5.</p>
let $pos := 1
let $threshold := 20
return
cts:walk($x, "this is",
(if ( $pos gt $threshold )
then xdmp:set($cts:action, "break")
else ($cts:text, xdmp:set($pos, $cts:start)) ) )
=>
("This is", "this is", "this is")
|
Example:
xquery version "1.0-ml";
(:
Show the first two matches.
:)
let $x := <p>This is 1, this is 2, this is 3, this is 4, this is 5.</p>
let $match := 0
let $threshold := 2
return
cts:walk($x, "this is",
(if ( $match ge $threshold )
then xdmp:set($cts:action, "break")
else ($cts:text, xdmp:set($match, $match + 1)) ) )
=>
("This is", "this is")
|
|
|