Marco the Matchmaker
Imagine, you are perusing the spatial data housed on your network. Thousands upon thousands of files to be sorted through, but you only want to know one thing. Which of these files are associated with a particular set of words?
It is a tale as old as time, and not at all an unreasonable request to make. Within a network, we have various methods for making the finding data easier. These often include implementing standardization practices, requiring the use of naming schemes or nesting file folders. These practices work to a basic extent, but how do we get more out of our data? How can we make this process for finding data based on keywords a bit more...dare I say it…helpful?
Integrated Marco Studio offers several approaches for answering this question, this question of…
How do I find which datasets are associated with specific keywords?
The Obvious – Search, Search, Search
When narrowing down your data inventory based on terms or keywords, the approach that yields immediate interactive results is that of Integrated Marco Mystic’s search functionality. I love stating the obvious sometimes, and this approach…well, it is a no-brainer.
As shown in the example below, the Marco Mystic interface allows for easy search and filtering of your inventory. It provides an immediate answer to that question of old.
The More Advanced – Categorize It
This next approach may not be as shiny as that search interface, but it is handy for very similar reasons. Not only does it allow you to narrow your inventory based on terminology, but it also provides you a record of those terms as well as stats as they relate to each dataset and keyword.
That approach? Categorize.
Integrated Marco Commander offers the marco categorize tool for just this purpose. Whether you deploy it in Marco Commander’s standard command-line prompt or in the sleek Marco Mystic admin interface, it will have the same impact. As in, it will categorize the crayons out of your data inventory.
Defining Data Categories
To begin, the tool requires the use of a Data Categories Spreadsheet. This is a simple file showcasing the Class and Category into which data should be divvyed. Within each Category, the user goes on to define the SubType (e.g., Polygon), characters to Exclude, and one or more Word specifications. Those datasets found to match listed Word entries will be recorded for those Category/Class combinations defined.
Why the spreadsheet format? Why not just type it all word by word?
Come on, guys and gals. No one wants carpal tunnel here.
The use of a spreadsheet to define data categories accomplishes multiple things. First, it ensures you are as thorough as possible. Have five separate words you want to catch for “Administrative Boundaries”? List them in Word1, Word2, etc. columns so that the data scans for them. Second, it provides a record of those categories and classes used. This is beneficial both for the data review process as well as ensuring you have your ducks in a row. Finally? We assume you are working with a mountain – not a mole hill – of data. By including this process as it is, you are able to hit data multiple times without repeatedly filling out forms. Win-win.
While it is true data can come in all shapes and sizes, the results of categorization through Integrated Marco Commander may come in only two – a series of Excel files or a Microsoft Access Database.
Datasets found within the inventory are scanned against their defined categories. All those whose data types and details match the Sub-Type and Words defined are rounded up and their information stored in a series of fields.
For Excel files, each class is given its own workbook. For Access, all results are stored in a single table.
An example of this Access table is shown above. As you can see, the prime takeaways from these details includes tidbits like:
- Dataset Name + Data Source
- Terms matched for each Dataset Name
- Health of dataset (e.g., broken?)
- Data type
- Class and Category Names
- Rank/Score (e.g., how well they match definitions)
- Vendor data
Categorizing data in this way provides many of the same details that you would receive during Marco Mystic's search. However, it also packages these records into a neat, little format better suited for analysis and reporting. This allows you to evaluate the inclusiveness of your inventory based on your standards at the time.
After all, what more would you ask of a matchmaker other than finding those datasets you desire?
About This Series
In the coming weeks, we will explore ways in which to answer these common questions and get the most out of your spatial data. Want to jump ahead and tackle these for yourself? Follow the links below or view the cheatsheet here.
- Week 1 - What user is responsible for the most broken map documents and layer files? Hint, Gerta may be the culprit.
- Week 2 - What are the most commonly referenced datasets in broken map documents? Heat maps are friends, not foes.
- Week 3 - How do I find which datasets are associated with specific keywords? Take matchmaking to a new level.
- Week 4 - What maps have not been access for more than five weeks? Data age is just a number.
- Week 5 - How many maps are using non-standard geographic transformations? Maybe more than you think.
- Week 6 - What are the extents for the most used datasets? For all extents and purposes, it's a must-know.
- Week 7 - What data matches x and y terms? Get your search on.