v10.17 (build: May 28 2024) |
|
Search in filesYou can enable indexing of document files on client machines for quick subsequent search documents by keywords or regular expressions within their content via BOSS-Online ("Search in files" function), as well as in the BOSS-Offline "Search in files" report (when periodic search is enabled).Attention: make sure that the database user is allowed access to the "Search in files" functions for BOSS-Offline/BOSS-Online on the rights settings page! Settings The indexing process may take a long time, and the index itself can use significant amount of space on the system disk of client machines. There are many settings for different needs in this functionality: providers.searchEngine.idlePeriod - time in msec after which the scanner enters the active scanning phase after the last user activity providers.searchEngine.rescanPeriod - time in msec to repeat the scan cycle after the completion of the previous providers.searchEngine.scanRateIdle - scanning intensity in the state of user inactivity (active phase) from 0.01 to 1.00 providers.searchEngine.scanRateUsed - scanning intensity in the state of user activity (passive phase) from 0.01 to 1.00 providers.searchEngine.scanRateForced - not used in current version providers.searchEngine.scanRateFrozen - not used in current version providers.searchEngine.maxFieldLength - max. number of words in the document to analyze providers.searchEngine.maxDocCharsToAnalyze - max. size of the document retrieved from the index, used to find the best chunk or when directly querying the entire document providers.searchEngine.maxSizeFile - max. file size to analyze in bytes providers.searchEngine.maxSizeNest - max. archive size to analyze in bytes providers.searchEngine.maxDepthDir - max. nesting depth of directories/folders when scanning providers.searchEngine.maxDepthNest - max. nesting depth of archives into each other providers.searchEngine.maxSizeIndex - max. index size in bytes providers.searchEngine.minIndexFreeSpace - min. size in bytes of free space on the system disk, at which scanning and index formation is possible providers.searchEngine.indexedFileLivetime - lifetime (in msec) of the indexed file in the database for the case when original file was deleted. Attention! This parameter should not be less than the time of the full scan cycle, therefore it is not recommended to set the value less than 2-3 days! providers.searchEngine.indexableFileTypes - indexable file types. In the current version supported formats are: zip,7z,rar,txt,csv,htm,html,eml,mht,pdf,doc,docx,xlsx,pptx,odt,ods,odp,odg providers.searchEngine.detectFileType - when enabled, the file type will be recognized by its signature, not the extension in the name. Attention! Enabling this parameter increases the indexing cycle time, as well as significantly increases the size of the index on disk! providers.searchEngine.excludeFileMasks - exclude file masks from scanning process providers.searchEngine.excludeDirMasks - exclude folders (without paths) from scanning process (masks also allowed) providers.searchEngine.storeIndexedContent - whether or not to store the contents of the found fragments in the index (true - increases the size of the index, but allows you to show fragments of the found text, and not just the path to the document, false - does not save) providers.searchEngine.includePath.1 - path for scanning. Specify asterisk * to scan all the drives (with exception of network drives). To search for a specific logical disk, you need to specify a path of the form \\?\C:\ it is not allowed to use here environment variables (%variable%)! providers.searchEngine.includePath.2 - if necessary, you can specify the second, third, etc. paths for scanning providers.searchEngine.ignoreRemovableDisks - if set to false, then removable disks will also be scanned in all disks scan mode providers.searchEngine.minTimestampFile - specify date in the format YYYY-MM-DD and files modified earlier this date will not be indexed providers.searchEngine.regExpOverlapSize - maximum string length that can be found using regular expressions providers.searchEngine.regExpMatchLimit - the maximum number of matches that can be found for one regular expression in one document providers.searchEngine.textAnalyzer - language analyzer used to break text into terms, possible values: Standard, Czech, Dutch, English, French, German, Russian providers.searchEngine.fragmentsLimit - the maximum number of text fragments that can be returned in search results for one document Note: each time these settings are changed, the index is completely rebuilt, i.e. the accumulated data is deleted and the scanning process starts over. Enable periodic search When enabled, it will periodically search for the query "Search query" every "Search interval" hours, the result will be sent to the server for "Search in Files" report. Max search results How many search results (actually documents in which the query was found) to use. Results are sorted by relevance. Possible values - from 1 to 255. Retrospective of results (days) If a request is found in a document, then it makes no sense to send information about this document to the server in each iteration of the search (so that there are no duplicate results in the report). This parameter indicates how many days to store information about the results found (if you specify 0, then the information will be stored indefinitely), i.e. after this number of days, the result will be stored to the report. Search query The search query is specified here. A line break in the query is equivalent to space character. A full description of the request format can be found here: https://lucene.apache.org/core/2_9_4/queryparsersyntax.html, but a slight extension of this syntax is also used. Examples: Search of words test and software in a single document: test AND software Strict comparison search (should be specified using double quotes): "word1 word2" Search of docx documents, in which there is a match for the regular expression @IPv4@ and contain word "personal": type:docx AND @IPv4@ AND personal Search file by it's md5 checksum: digest:3fcdcb42d0797d0b08c52e9d214b4ad2 Search of encrypted files (for example, password-protected archives): flags:encrypted OR flags:unsupported Note: searching for regular expressions is only possible from this list, specifying the desired regular expression in the format @NAME@ (names are case sensitive!) If you need to find any from regular expressions ("OR" condition), then you need to use @||@ If you need to find all regular expressions ("AND" condition), then you need to use @&&@ |
|
© Scopd |