About Record Store and Record Generations
A set of records that has been committed to a Record Store instance is a record generation.
For example, if you perform a full file system crawl, all the records returned from the crawl are written to the
Record Store and a commit is done. After the commit is done, the Record Store has one generation of records.
A subsequent crawl, either full or incremental, results in a second generation of records.
Each record that is read in contains a unique ID. CAS uses that unique ID as the value of the idPropertyName
Record Store configuration property.
If a record already exists with that unique ID during later CAS crawls, then the later version replaces the earlier
one. This ensures that when you run an incremental crawl, you always get the latest version of any given
record.
A record generation is removed from a Record Store instance by the clean task after the generation becomes
stale. A stale generation is a generation that has been in a Record Store instance for a period of time that
exceeds the value of the generationRetentionTime Record Store configuration property.
Command for creating the RecordStore
C:\Endeca\CAS\11.1.0\bin\cas-cmd.bat createDimensionValueIdManager -h localhost -p 8500 -m ATGen-dimension-value-id-manager
with the CAS we can do the following tasks
C:\Endeca\CAS\<version>\bin>cas-cmd.bat --help
usage: cas-cmd <task-name> [options]
[Inspecting Installed Modules]
getAllModuleSpecs
getModuleSpec
listModules
[Managing Crawls]
createCrawls
deleteCrawl
getAllCrawls
getCrawl
getCrawlIncrementalSupport
listCrawls
startCrawl
stopCrawl
updateCrawls
[Managing Dimension Value Ids]
createDimensionValueIdManager
deleteDimensionValueIdManager
exportDimensionValueIdMappings
generateDimensionValueId
getDimensionValueId
importDimensionValueIdMappings
[Viewing Crawl Status and Results]
getAllCrawlMetrics
getCrawlMetrics
getCrawlStatus
About index_config_cmd.bat
During baseline update processing, the Content Acquisition System merges and processes index configuration
from all owners into a consolidated set of MDEX-compatible output files.
If multiple import owners modify the same attribute, the configuration from the system owner always overrides
other import owners during the merge process.
The Index Configuration Command-line Utility writes and reads index configuration as JSON. The schema for
the JSON file varies depending on whether you retrieve configuration for one owner or more than one owner
and whether you restrict the types of configuration that you retrieve.
Types of configuration include:
• Endeca properties, derived properties, and dimensions. These are specified under the attributes node.
• Precedence rules. These are specified under the precedenceRules node.
• Search configuration. These are specified under the searchIndexConfig node.
CAS-based data processing
The Deployment Template supports running baseline and partial updates using CAS as a replacement for
Forge. In this processing model, the update runs a CAS crawl to produce MDEX-compatible output. This is
the step that removes the need for Forge. Then the update runs Dgidx and updates the Dgraphs in an application.
Dgraph baseline update script using CAS
You do not need to run Forge if you run a CAS crawl that is configured to produce MDEX-compatible output
as part of your update process.
This example runs a baseline update that includes a full CAS crawl. The crawl writes MDEX compatible output
and then the update invokes Dgidx to process the records, dimensions, and index configuration produced by
the crawl. To create this sequential workflow of CAS crawl and then baseline update, you add a call to run¬
BaselineCasCrawl() to run the CAS crawl.
Initial setup.xml
Importing dimension value Id mappings
The importDimensionValueIdMappings task imports dimension value Id mappings from a CSV file into
a Dimension Value Id Manager. The restore process completely replaces all dimension value Id mappings
stored in the Dimension Value Id Manager.
The syntax for this task is:
cas-cmd importDimensionValueIdMappings [-h HostName] [-l true|false] -m dvalmgr
[-p PortNumber] -f mappings.csv
DataIngest.xml
Specifies data processing scripts, including the baseline update script, partial update
script, and the components to perform data processing such as CAS or Forge and Dgidx.
The Record_spec.xml
This file contains a RECORD_SPEC element that specifies a property to identify records during partial updates.
When implementing partial updates, the RECORD_SPEC element uses this property to preserve stable record IDs across baseline runs. That is, a record will have the same ID in the next update as in the current update. For more information, see the Endeca MDEX Engine Partial Updates Guide.
No comments:
Write comments