Spatial Data Diagnosis: Matchmaking Files by Keywords with Marco

iii
May 15, 2017
4 min read

Blog series on reviewing spatial data and identifying broken files and issues, including matching datasets and files based on specific keywords with Integrated Marco Studio software.

Imagine, you are perusing the spatial data housed on your network. Thousands upon thousands of files to be sorted through, but you only want to know one thing. Which of these files are associated with a particular set of words?

It is a tale as old as time, and not at all an unreasonable request to make. Within a network, we have various methods for making the finding data easier. These often include implementing standardization practices, requiring the use of naming schemes or nesting file folders. These practices work to a basic extent, but how do we get more out of our data? How can we make this process for finding data based on keywords a bit more...dare I say it…helpful?

Integrated Marco Studio offers several approaches for answering this question, this question of…

How do I find which datasets are associated with specific keywords?

The Obvious – Search, Search, Search

When narrowing down your data inventory based on terms or keywords, the approach that yields immediate interactive results is that of Integrated Marco Mystic’s search functionality. I love stating the obvious sometimes, and this approach…well, it is a no-brainer.

View of search engine within Integrated Marco Mystic web application for displaying and organizing ArcGIS spatial data.

As shown in the example below, the Integrated Marco Mystic interface allows for easy search and filtering of your inventory. It provides an immediate answer to that question of old.

The More Advanced – Categorize It

This next approach may not be as shiny as that search interface, but it is handy for very similar reasons. Not only does it allow you to narrow your inventory based on terminology, but it also provides you a record of those terms as well as stats as they relate to each dataset and keyword.

That approach? Categorize.

Integrated Marco Commander offers the marco categorize tool for just this purpose. Whether you deploy it in Integrated Marco Commander’s standard command-line prompt or in the sleek Integrated Marco Mystic admin interface, it will have the same impact. As in, it will categorize the crayons out of your data inventory.

Defining Data Categories

To begin, the tool requires the use of a Data Categories Spreadsheet. This is a simple file showcasing the Class and Category into which data should be divvied. Within each Category, the user goes on to define the SubType (e.g., Polygon), characters to Exclude, and one or more Word specifications. Those datasets found to match listed Word entries will be recorded for those Category/Class combinations defined.

Why the spreadsheet format? Why not just type it all word by word?

Come on, guys and gals. No one wants carpal tunnel here.

The use of a spreadsheet to define data categories accomplishes multiple things. First, it ensures you are as thorough as possible. Have five separate words you want to catch for “Administrative Boundaries”? List them in Word1, Word2, etc. columns so that the data scans for them. Second, it provides a record of those categories and classes used. This is beneficial both for the data review process as well as ensuring you have your ducks in a row. Finally? We assume you are working with a mountain – not a mole hill – of data. By including this process as it is, you are able to hit data multiple times without repeatedly filling out forms. Win-win.

Match-Making Marco

While it is true data can come in all shapes and sizes, the results of categorization through Integrated Marco Commander may come in only two – a series of Excel files or a Microsoft Access Database.

Datasets found within the inventory are scanned against their defined categories. All those whose data types and details match the Sub-Type and Words defined are rounded up and their information stored in a series of fields.

For Excel files, each class is given its own workbook. For Access, all results are stored in a single table.

An example of this Access table is shown above. As you can see, the prime takeaways from these details includes tidbits like:

Dataset Name + Data Source
Terms matched for each Dataset Name
Health of dataset (e.g., broken?)
Data type
Class and Category Names
Rank/Score (e.g., how well they match definitions)
Vendor data

Categorizing data in this way provides many of the same details that you would receive during Integrated Marco Mystic's search. However, it also packages these records into a neat, little format better suited for analysis and reporting. This allows you to evaluate the inclusiveness of your inventory based on your standards at the time.

After all, what more would you ask of a matchmaker other than finding those datasets you desire?

Explore the Series

When it comes to reviewing your Geographic Information System (GIS) data and inventories, there are questions you should have at the ready. Discover common questions we ask of our spatial data to get the most out of these resources - and better yet, how we actually achieve answers. Explore the cheat sheet here and dive in to each post in the series below.