v9.80 (build: Jul 4 2023)

Search in files

You can enable indexing of document files on client machines for quick subsequent search documents by keywords or regular expressions within their content via BOSS-Online ("Search in files" function), as well as in the BOSS-Offline "Search in files" report (when periodic search is enabled).

Attention: make sure that the database user is allowed access to the "Search in files" functions for BOSS-Offline/BOSS-Online on the rights settings page!

Settings
The indexing process may take a long time, and the index itself can use significant amount of space on the system disk of client machines. There are many settings for different needs in this functionality:

providers.searchEngine.idlePeriod - time in msec after which the scanner enters the active scanning phase after the last user activity

providers.searchEngine.rescanPeriod - time in msec to repeat the scan cycle after the completion of the previous

providers.searchEngine.scanRateIdle - scanning intensity in the state of user inactivity (active phase) from 0.01 to 1.00

providers.searchEngine.scanRateUsed - scanning intensity in the state of user activity (passive phase) from 0.01 to 1.00

providers.searchEngine.scanRateForced - not used in current version

providers.searchEngine.scanRateFrozen - not used in current version

providers.searchEngine.maxFieldLength - max. number of words in the document to analyze

providers.searchEngine.maxDocCharsToAnalyze - max. size of the document retrieved from the index, used to find the best chunk or when directly querying the entire document

providers.searchEngine.maxSizeFile - max. file size to analyze in bytes

providers.searchEngine.maxSizeNest - max. archive size to analyze in bytes

providers.searchEngine.maxDepthDir - max. nesting depth of directories/folders when scanning

providers.searchEngine.maxDepthNest - max. nesting depth of archives into each other

providers.searchEngine.maxSizeIndex - max. index size in bytes

providers.searchEngine.minIndexFreeSpace - min. size in bytes of free space on the system disk, at which scanning and index formation is possible

providers.searchEngine.indexedFileLivetime - lifetime (in msec) of the indexed file in the database for the case when original file was deleted.
Attention! This parameter should not be less than the time of the full scan cycle, therefore it is not recommended to set the value less than 2-3 days!

providers.searchEngine.indexableFileTypes - indexable file types.
In the current version supported formats are: zip,7z,rar,txt,csv,htm,html,eml,mht,pdf,doc,docx,xlsx,pptx,odt,ods,odp,odg

providers.searchEngine.detectFileType - when enabled, the file type will be recognized by its signature, not the extension in the name.
Attention! Enabling this parameter increases the indexing cycle time, as well as significantly increases the size of the index on disk!

providers.searchEngine.excludeFileMasks - exclude file masks from scanning process

providers.searchEngine.excludeDirMasks - exclude folders (without paths) from scanning process (masks also allowed)

providers.searchEngine.storeIndexedContent - whether or not to store the contents of the found fragments in the index (true - increases the size of the index, but allows you to show fragments of the found text, and not just the path to the document, false - does not save)

providers.searchEngine.includePath.1 - path for scanning. Specify asterisk * to scan all the drives (with exception of network drives). To search for a specific logical disk, you need to specify a path of the form \\?\C:\
it is not allowed to use here environment variables (%variable%)!

providers.searchEngine.includePath.2 - if necessary, you can specify the second, third, etc. paths for scanning

providers.searchEngine.ignoreRemovableDisks - if set to false, then removable disks will also be scanned in all disks scan mode

providers.searchEngine.minTimestampFile - specify date in the format YYYY-MM-DD and files modified earlier this date will not be indexed

providers.searchEngine.regExpOverlapSize - maximum string length that can be found using regular expressions

providers.searchEngine.regExpMatchLimit - the maximum number of matches that can be found for one regular expression in one document

providers.searchEngine.textAnalyzer - language analyzer used to break text into terms, possible values: Standard, Czech, Dutch, English, French, German, Russian

providers.searchEngine.fragmentsLimit - the maximum number of text fragments that can be returned in search results for one document

Note: each time these settings are changed, the index is completely rebuilt, i.e. the accumulated data is deleted and the scanning process starts over.


Enable periodic search
When enabled, it will periodically search for the query "Search query" every "Search interval" hours, the result will be sent to the server for "Search in Files" report.

Max search results
How many search results (actually documents in which the query was found) to use. Results are sorted by relevance. Possible values - from 1 to 255.

Retrospective of results (days)
If a request is found in a document, then it makes no sense to send information about this document to the server in each iteration of the search (so that there are no duplicate results in the report). This parameter indicates how many days to store information about the results found (if you specify 0, then the information will be stored indefinitely), i.e. after this number of days, the result will be stored to the report.

Search query
The search query is specified here.
A line break in the query is equivalent to space character.
A full description of the request format can be found here: https://lucene.apache.org/core/2_9_4/queryparsersyntax.html, but a slight extension of this syntax is also used.

Examples:

Search of words test and software in a single document:
test AND software

Strict comparison search (should be specified using double quotes):
"word1 word2"

Search of docx documents, in which there is a match for the regular expression @IPv4@ and contain word "personal":
type:docx AND @IPv4@ AND personal

Search file by it's md5 checksum:
digest:3fcdcb42d0797d0b08c52e9d214b4ad2

Search of encrypted files (for example, password-protected archives):
flags:encrypted OR flags:unsupported

Note: searching for regular expressions is only possible from this list, specifying the desired regular expression in the format @NAME@ (names are case sensitive!)
If you need to find any from regular expressions ("OR" condition), then you need to use @||@
If you need to find all regular expressions ("AND" condition), then you need to use @&&@

© Mirobase