Knowledge Management for Environmental Resources: Data Discovery
This past week we strongly hinted at posts to come highlighting the importance of and methodology for Knowledge Management as it applies to the Environmental Resources sector and the multitude of data associated with it. The first stop on our Environmental Knowledge Management Express is that of Data Discovery.
If you have been around these parts long enough, you know that this particular feat is an important - and often overwhelming - one. Before properly managing data on an enterprise system, we must first know what is there and even, what is not there. To fully understand this, we must look at its metadata, breaking it into details based on the file as well as the GIS component.
Find File Metadata
Before digging into the spatial components of our enterprise data, we like to start simple. We want to start from the ground up, meeting our data in its most basic and utilized form before venturing into more convoluted territory. This form is that of the file itself. Since the metadata attached to each individual file can provided detailed insights without forcing us to dig deeper into the GIS components, it is recommended to begin the discovery process here. There are plenty of directions in which you may find yourself pulled, but for the sake of time, resources, and sanity, we advise focusing on location, ownership, relevancy, and status.
One of the most important details to gather for spatial data - or really, any data - is where it is located. This is especially true for those with spatial components as one incorrect move can render Layer Files and Map Documents useless. Knowing the full path of the file is essential to being able to find, share, and evaluate the data at hand. This applies to Layer Files, File Geodatabases, Map Documents, Map Projects, etc.
Understanding where your data lives is handy in distinguishing the data type (i.e., Are these types stored where they should be? Does this file extension match what is expected?), discovering broken links (e.g., Where is the new location of this Layer File referenced in that Map Document?), and migrating data on the network. Data migration in particular is a tricky practice and one that is done often as companies revamp their own processes and structures. Realizing the location of data may help to determine where it can be moved to and how it should be moved (e.g., Can we move these files singularly to the new location? Should be employ a Remap File to help in this process?).
With that in mind, location can also allow you to discern hot spots on your network. Finding which drives, folders, and containers house the most data can help you either determine how much effort it would be to relocate and relearn these systems or, better yet, whether they should remain in place afterall.
Another essential piece of information about your spatial data is its ownership - or the user account with which it is associated. This knowledge is useful in a number of ways, including:
- Point of Contact Provided - In instances where more information is needed about a file, this user may be contacted.
- Evaluating User Responsibility - Should new members be added to a team and workloads divided or even deciding who should be maintaining this data, the user account can help determine responsibility.
- Hinting at Maintenance - Discerning ownership may help indicate out-of-date user accounts, unmaintained data, or even third-party sources that may need to be checked for a more recent version.
Here, relevancy refers to temporal-based information about the files at hand rather than keyword or term association.
When was it created? When was it last modified? When was it last accessed?
Being aware of the relevancy shows how well files have been maintained as well as if they are in desperate need of an update or just altogether irrelevant.
The information we gather about location, ownership, and relevancy are just a snapshot in time. These things can and do change. Files - especially those that see heavy use - do not typically stay static.
As important as it is to know when data was created, it is also important to know when we last checked - therefore, when we need to run a refresh. This knowledge can then be used for your own maintenance practices or even feed into your organization's overall data management plan.
Grasp GIS Metadata
Once you have gathered the above information concerning your spatial data files, it is likely you will have everything you need for a standard data management program. To take this one step further - and really ground your data - we should explore determining those GIS components of each file that often determine what will and will not be used. This includes layer brokenness, layer symbology, and spatial reference.
There is not much worse in a work day when you open your latest ArcMap Document only to be greeted by a chorus of red exclamation points. You know the kind, the ones within the Table of Contents that signify two of the most daunting words in GIS - broken layers.
Broken layers occur when the underlying data source of a layer does not match the file path defined in the Layer Properties within ArcMap. When brokenness, the layer is rendered useless and will not appear within the workspace. Because maps essentially are not allowed to their job if this data is broken, know what is broken is extremely helpful.
Recently moved a drive? You likely have broken data.
Moving SDE instances? You likely are seeing red...exclamation points, that is.
Referencing the C:/ drive? You guessed it. Broken data is likely.
Knowing what MXDs are not working as well as the files that are responsible helps to better prioritize clean-up and more accurately define data standards.
Another component specifically related to spatial data is that of symbology. Because many organizations have clearly defined standards on how data is represented, this is often an essential piece of the GIS puzzle.
Thanksfully, the symbology of maps and layers alike can be gathered and stored to be further used for things like...
- To develop a repository of symbology, particularly for commonly used assets.
- Facilitate the development and enforcement of company wide standards and representation.
Once data is located, further details on spatial reference, extent, and transformation may be collected. This information may be useful for any number of things. For instance, you can ensure datasets are used properly (e.g., extent of dataset is within the extent of the data frame). You may better enforce corporate standards by ensuring valid transformations are used where applicable. Additionally, spatial reference is valuable in conjunction with search of these datasets.
Is That All?
The type of information that helps you and your team better understand your enterprise data will likely differ based on your organization, project, and availability in general. With that said, the above components serve as handy jumping off points to dig deeper and fully grasp what is and is not at your disposal.
About This Series
These posts provide tips on the management of enterprise data within Environmental resources and the like. Other posts in this series discuss the importance of Knowledge Management, storing spatial data details in a results database, and analyzing the data knowledge we gather.