Search Engine Scraper by Creative Bear Tech Change Log
Our tech wizards are constantly making new updates to our Search Engine Scraper. Below you will see the latest changes to the software. If you have purchased a copy of the software, you will be able to update your software automatically. You do not need to download any additional files.
- Added licensing
- Created "Creative Bear Tech Manager" that will keep the application running in case it's interrupted.
- Added auto-update
- Updated the application GUI
- Updated LinkedIn, Trust Pilot, and Duckduckgo scraper.
- Adding XEvil service to resolving the captcha issue.
- Enhance the speed of scraping by update the multithreading mechanism.
- The user controls the number of threads running by the scraper.
- Enhanced proxy routing mechanism, each sub scraper form has its own proxy.
- Updated Google Map scraper
- Add setting for enabling/disabling the application logs [this can enhance the scrapers speed]
- Updated Google Map scraper
- Simplified The footprints tool, merge the keywords with every single/multiple footprints.
- Add public proxies scraper tool, auto-check and verify the public proxies, automatically remove non-working proxies and scrape new proxies every X number of minutes.
- Scraping business name, Add the name of the business name to our results.
- Add an option to disable the real-time view of results, this option will reduce the consumption of processing power.
- Enhance auto resolving the captcha issue
Upcoming Updates - December 2019
TURBO SPEED settings
The Problem: It has been brought to our attention by many users that when using a large keywords list, a lot of the time, the scraper skips duplicate websites. This appears to be the case for keywords that are closely interrelated. For example, if you search for bitcoin and then bitcoin news, bitcoin price, bitcoin blog, bitcoin mining, bitcoin miners, bitcoin mining software, you would get a lot of duplicates and the scraper would waste a lot of time and resources in skipping duplicate sites instead of finding new ones.
The Solution: under the "Speed" settings, we are going to add a smart performance option/ Turbo Speed option that will enable a user to "[icon of a clock] [check box] Skip the processing of a keyword if no results have been found after [number] of pages/SERPs (recommended for increasing the scraping speed)". This setting will significantly increase the scraping speed by preventing the web scraper from processing and cancelling duplicated websites.
Magic keyword generator
The Problem: coming up with relevant keywords for scraping takes a lot of time, energy and external resources.
The Solution: Magic Keyword Generator. This magic keyword generator would be available under the keywords text field by way of a button "Magic Keyword Generator". Inside a separate window, we are going to have the following:
Related Searches Keyword Ideas from Search Engines
"[Icon of a Magic Wand] [check box] Automatically generate keywords by getting related keyword searches from the search engines".
"Select the search engines from which to get keyword ideas from related searches: [check box] Google [check box] Bing [check box] Yahoo [check box] Yandex [check box]
"Crawling Depth [number]"
"Maximum Number of Keywords to Extract (0 unlimited)"
Keywords from Website Meta Titles
"[Icon of a Magic Wand] [check box] Automatically generate keywords by grabbing website meta titles".
"Maximum Number of Sites to crawl"
"Maximum Number of Keywords to Extract (0 unlimited)"
The idea behind these options is that we would allow a user to generate niche-related keywords on auto-pilot by entering their root keywords. The website scraper would take take those keywords and run search engine searches and grab related keywords from 1) Related Searches Keyword Ideas from Search Engines AND/OR Keywords from Website Meta Titles. This would help a user to automatically generate their own niche-targeted list of popular keywords.
Under this section, we will include the following options: "[check box] skip keyword suggestion that have the root keyword", "[check box] skip keywords longer than [number] words"
Ideas for Future Updates
Privacy and Proxies
- Under the Proxies tab, we should add two options: 1) Use Private Proxies and 2) Use Public Proxies. By checking one or the other option, the app would know what proxies to use. If a user ticks the public proxies option, the software should automatically search for new proxies, delete non-working proxies, etc on auto pilot without requiring any input from the user.
- When clicking on the "Public Proxies" button on the main GUI, we should also add one check box: Use Public Proxies. If a user checks this option, the software will also automatically check the Use Public Proxies on the Proxies tab inside the settings. This will help to clear up any confusion and speculation from the user's side. Inside the Public Proxies window, there are no options to control the public proxy sources. We can either keep everything as is and constantly add new public proxy sources with each update or we can allow the user to add their own public proxy sources. Also, we should have some form of an option to decide how many proxies a user wants to use.
- Scraping via a TOR browser. This is where inside the proxy settings, the app would simply connect and scrape using the TOR browser. [Admin: Dismissed. We have enough privacy/anonymity settings already. The TOR browser will not provide with a big advantage as it has a limited number of IPs and most search engines and website platforms can detect TOR sessions.]
- Scraping using Google Chrome Incognito window. [Admin: Dismissed. We have enough privacy/anonymity settings already.]
- Both options would be available by way of check boxes under the proxies tab. [Admin: Will be added in the next update]
- Under the public proxies tab, the app should have an option to auto check and verify the public proxies every X number of minutes (make sure that they are working), automatically remove non-working proxies and scrape new proxies every X number of minutes OR when the total working proxy number falls below X number of proxies. As well as allowing the user to upload and enter their own public proxy sources, we can have a list of ALL proxy source urls and the user can simply check using checkboxes which proxy sources they would like to use. The idea here is that the app will constantly monitor the proxies by removing non-working ones and scraping and adding new ones every so often to ensure that the app has enough proxies to run on at all times.
- The Footprints function should be simplified. Once a user opens up the footprints button, the app should simply give them a field to enter their own footprints inside a pane or upload a file. Next to the pane, we should have a check box to "Combine footprints with keywords". The app would then merge their keywords with every single footprint. For example, if we have 1 keyword and 20 footprints, this would give us 20 unique keywords: root keyword + footprints. The idea here is to save a user time and effort. For example, the footprints section could be used to search for guest posting opportunities or whatever a user likes.
Search results pane
- We could add an option to disable the real time view of results / disable GUI to reduce the consumption of processing power. We can simply add a check box with something along the lines of "Disable GUI for faster speeds".
- Inside each column name i.e. url, email, website, address we should add a check box so that a user can select exactly what data to scrape. And in the first column we could have one checkbox to select all or select none.
- We should add the name of the business name to our results. We can get this via Facebook business page.
SPeeds and threads
- We could add an option to automatically change the thread numbers if the CPU usage exceed X% and ram exceeds X%. We could allow the user to enter / select any values they want. This is a good way to ensure the stability of the app.
- We could add an option to "Skip subdomain sites" as those tend to be web 2.0 and contain a lot of spam. This could allow us to save time and processing power.
Extra scrapers / dictionaries
- We could add Instagram to the list. However, the specifics of how Instagram works are different to other sources. We should add some simple options under Instagram drop down whether to search for users or hashtags on Instagram or both. We should also add an ability to login / add login details to an Instagram account under the last tab inside the settings.
adding accounts under the last settings tab
- We should also add a field for adding a private proxy that will tie that account. This will ensure that the user will always access social media accounts through one private proxy. Perhaps it would be a good idea to all a user to enter their accounts inside a plain text pane in a format like username:password:proxy:platform (platform would be our shortcode that would identify the social network. The could be LinkedIn, Facebook, Twitter. This would make it quicker to enter accounts. If a user enter more than 1 account. The app could switch between accounts every X number of minutes.