Creating WebDAV Server With Search Support (DASL), Java

IT Hit WebDAV Server Engine supports WebDAV DASL compliant search.

Search Interface

To enable search support, you must implement Search interface on folder items that must support search. This interface provides a single method Search.search() that is called when the Engine receives search request passing search phrase and search options:

public interface Search extends Folder {

     PageResults search(String searchString, SearchOptions options, List<Property> propNames, Long offset, Long nResults);
}

In this method implementation, you will query your storage and return items that correspond to search request. You are free to return items found in a subtree, found in this folder only or found on the entire server, it totally depends on your implementation.

The SearchOptions parameter contains 2 boolean flags that indicate where the search should be performed: in file content, in file names or both in file name and file content.

When IT Hit Ajax File Browser is used as a search client it will request file names search only when displaying pop-up hint:

If( SearchOptions.SearchName && !SearchOptions.SearchContent)
{
// Ajax Browser pop-up hint request
}

When requesting search results both SearchOptions.isSearchName() and SearchOptions.isSearchContent() will return true.

Discovering DALS Search support

To find out if your server supports search feature, the WebDAV client will submit the OPTIONS request on a folder. If the Engine discovers Search interface on that folder it will return SEARCH token in Allow header as well as DASL: <DAV:basicsearch> header.

See how IT Hit Ajax Browser is discovering Search support.

Examples of DASL Search Request

The WebDAV Engine supports DASL basic search. It can process file content search request, file name search or both. Here is an example of search XML that searches for the word starting with “general” contained in file names or in file content:

<?xml version="1.0"?>
<searchrequest xmlns="DAV:">
  <basicsearch>
    <select>
      <prop>
        <resourcetype/>
        <displayname/>
        <creationdate/>
        <getlastmodified/>
        <getcontenttype/>
        <getcontentlength/>
        <supportedlock/>
        <lockdiscovery/>
        <quota-available-bytes/>
        <quota-used-bytes/>
        <checked-in/>
        <checked-out/>
      </prop>
    </select>
    <where>
      <or>
        <like>
          <prop>
            <displayname/>
          </prop>
          <literal>general%</literal>
        </like>
        <contains>general%</contains>
      </or>
    </where>
  </basicsearch>
</searchrequest>

Search Phrase Wildcards and Escaping

The DASL search phrase can contain wildcard characters and escape according to DASL rules:

  • ‘%’ – to indicate one or more character.
  • ‘_’ – to indicate exactly one character.

If ‘%’, ‘_’ or ‘\’ characters are used in search phrase they are escaped as ‘\%’, ‘\_’ and ‘\\’.

Example of Full-Text Search on Files Stored in File System

In the example below we will use the Apache Lucene indexing to search file names and file content. Apache Lucene indexing can search file content, including Microsoft Office documents as well as any other documents that are supported by content analysis toolkit Apache Tika.

Below is the Search interface implementation that is using Searcher class that utilize Lucene index to search file names and contents:  

PageResults search(String searchString, SearchOptions options, List<Property> propNames, Long offset, Long nResults) {
        List<HierarchyItem> results = new LinkedList<>();
        Searcher searcher = getEngine().getSearcher();
        if (searcher == null) {
            return PageResults(results, (long) results.size());
        }
        Set<String> paths;
        try {
            String decodedPath = decode(getPath());
            paths = searcher.search(searchString, options, decodedPath);
            for (String path: paths) {
                try {
                    HierarchyItem item = getHierarchyItem(path, getEngine());
                    if (item != null) {
                        results.add(item);
                    }
                } catch (Exception ex) {
                    getEngine().getLogger().logError("Error during search.", ex);
                }
            }
        } catch (ServerException e) {
            getEngine().getLogger().logError("Error during search.", e);
        }
        return new PageResults((offset != null && nResults != null) ? results.stream().skip(offset).limit(nResults).collect(Collectors.toList()) : results, (long) results.size());
    }

 Searcher accepts   SearchLine, SearchOptions and  ParentPath - path to the folder where we search

Set<String> search(String searchLine, SearchOptions options, String parent) {
        searchLine = searchLine.replaceAll("%", "*");
        searchLine = searchLine.replaceAll("_", "?");
        Set<String> paths = new LinkedHashSet<>();
        try (IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(indexFolder)))){
            indexSearcher = new IndexSearcher(reader);
            if (options.isSearchName()) {
                paths.addAll(searchName(searchLine, parent));
            }
            if (options.isSearchContent()) {
                paths.addAll(searchContent(searchLine, parent));
            }
        } catch (Exception e) {
            logger.logError("Error while doing index search.", e);
        }
        return paths;
    }

 

To see the complete code, examine the File System Storage sample.

Example of Full-Text Search on Files Stored in a Database

The search in Oracle DB you can also use Apache Lucene. The only difference is that in this samples we are searching for file id in DB which name or content is matching search conditions.

Below you can see an example of the Search.search implementation that searches Oracle:

    PageResults search(String searchString, SearchOptions options, List<Property> propNames, Long offset, Long nResults) {
        List<HierarchyItem> results = new LinkedList<>();
        Searcher searcher = getEngine().getSearcher();
        if (searcher == null) {
            return PageResults(results, (long) results.size());
        }
        Set<String> ids;
        ids = searcher.search(searchString, options);
        for (String id : ids) {
            try {
                // Constructing item path using Oracle capabilities
                String path = getDataAccess().executeScalar("select path from " +
                        "  (SELECT id, SYS_CONNECT_BY_PATH(name, '/') path " +
                        "   FROM REPOSITORY where id = (select min(id) from REPOSITORY) " +
                        "   START WITH id = ? " +
                        "   CONNECT BY id = PRIOR parent and parent!= prior id)", id);
                String[] pathParts = path.split("/");
                pathParts = Arrays.copyOf(pathParts, pathParts.length-1);
                StringBuilder pathBuilder = new StringBuilder();
                for (int i = pathParts.length - 1 ; i>=0 ; i--) {
                    if (Objects.equals(pathParts[i], "")) {
                        continue;
                    }
                    pathBuilder.append("/");
                    pathBuilder.append(pathParts[i]);
                }
                String itemPath = pathBuilder.toString();
                String decodedPath = getDataAccess().decode(getPath());
                if (itemPath.startsWith(decodedPath)) {
                    HierarchyItem item = getDataAccess().getFile(Integer.valueOf(id), itemPath);
                    if (item != null) {
                        results.add(item);
                    }
                }
            } catch (Exception ex) {
                getEngine().getLogger().logError("Error during search.", ex);
            }
        }
        return PageResults(results, (long) results.size());
    }

The implementation of the Searcher almost the same as in File System Samples. 

Indexing

All samples use Apache Lucene for indexing and Apache Tika as content analysis toolkit to build index. However as indexing is not part of WebDAV protocol, it totally depends on your implementation how you build your index.

To see the complete code, examine the SQL Storage sample or DeltaV Storage sample.

 

See Also: