When I first opened the KDResearch software, with 5200 Kindle sub-categories, I thought the database was pretty thorough. For each test we did, the database seemed complete.
Then yesterday, I started stumbling over categories that are not there.
For example, “Naturopathy” which is a category of books I am personally interested as a reader. There are only two of them in my database, and nearly a dozen inside Amazon.
It flagged a shortcoming in my database.
My goal is to be 99.5% accurate in my representation of Amazon sub-categories. I’d say 100%, but Amazon techs are adding new sub-categories daily, and I simply cannot stay ahead of what is done on their side in real time.
Here is the shocking reality about Amazon. I just finished the process of cataloging the entire Amazon directory tree — including physical products. The physical products category tree will be going into a different website.
The final count is done. There are more than 75,000 individual sub-categories within Amazon. And many of us have discovered, some of the sub-sub-categories were incomplete on the original crawl.
I am in the process of adding two significant upgrades to the software, which I do believe will allow me to fill out more of those missing categories.
New Tool #1 will allow you to type in a keyword search query, and the software will find the top 100 best selling books related to that keyword, then the software will capture the directories listed for those top 100 books and document them for us.
This will allow me to capture all of the categories related to a specific keyword, and it will allow me to capture many of the missing categories.
New Tool #2 will allow the same process, but with a sub-category as a starting point. It will capture the top 100 books, and then map the categories used with those books.
Both tools are going to let me bring in categories that I might have missed on the first crawl — which was done with the Amazon API, which should have been complete, because the information was coming directly from Amazon.
Both of these tools will be found under the category “Stacked Searches” when they are complete. The reason they are going there is because getting the data for one keyword will result in 110 Amazon API calls, and Amazon currently throttles my requests to one per second. So “Stacked Searches” will require nearly two minutes of crawl time.
Because of the throttling and because of the volume of requests I expect on those tools, I have set it up in such a way that you make the initial request for the data, the software does the first ten API calls, then it returns you to the Search page, where you will see a list of “Your Searches”. Each keyword search request is attributed to an individual member, so that no one else can see what you are researching.
So, on the Search page, you will have a Search Box and a historical list of searches “you” have conducted previously. When the final piece of data has been collected, you will be able to re-open the search page and see a link to the results you wanted.
When I gather the book-related data, that information will only be updated once a day.
As I am able to archive more and more book data, some of the searches will go more quickly, because I will be able to pull from archived data instead of fresh Amazon API calls.
The bottom line on “Stacked Searches” is that I can use it to fill out more of the missing categories, but for you, it brings an added advantage. “Stacked Searches” will allow you to see what categories are serving the Top 100 Best Selling Books on a keyword search or sub-category search, giving you the ability to dig even deeper into the data.
When request volume is high, “Stacked Searches” might take a few minutes to yield results. When request volume is low, I should be able to get it down to close to a minute.
Later in the week, I plan on adding a submission box that allows us to add sub-categories not represented and put them into the database for a re-crawl.