How to Build a Company Employee Dataset

How to Build a Company Employee Dataset
Scraping

Data has become one of the most valuable assets for any company. Having reliable and well-structured information about a business or its competitors can provide a significant strategic edge. In this tutorial, we’ll explore how to combine the precision of Google Dorks with the automation capabilities of Piloterr APIs to collect public LinkedIn profile data. The end result will be a structured .json dataset, ready for analysis.

Use case

Generate a list of employees from a specific company to build a dataset for HR purposes, recruitment efforts, or organizational structure analysis.

How Does it Work ?

  1. Use Google Dorks to retrieve indexed LinkedIn profile links
  2. Automate the search using the Piloterr Google Search API
  3. Extract public data from LinkedIn profiles using the Piloterr LinkedIn Profile API
  4. Merge all the information into a clean dataset
Schema of the workflow with Excalidraw

This tutorial is divided into two standalone sections that you can follow in any order:

  • Step-by-step guide: How to use Google Dorks and Piloterr APIs.
  • Full project execution: Clone the repo and run the complete script.

Step-by-Step : How to Use Google Dorks and Piloterr APIs

In this chapter, we will learn how to connect google dork with Piloterr API and LinkedIn profile scraping, step by step.

What is a Google Dork, and Why Is It Powerful ?

Google Dorks are advanced search operators that help filter search results.

Google already indexes billions of pages, we can take advantage of that by crafting smart queries.

Build a dork to list public LinkedIn profiles related to Apple Inc.

To test and run a Google dork, we just need a Google search bar.

Let’s start with the following sample syntax on how to list public LinkedIn profiles related to “Apple Inc.” 

The search : 

The result : 


We get a list of people that have “Apple” in their LinkedIn profile.

It is not magic, let’s see the syntax break down to understand what’s happening :

Search LinkedIn Profiles using Google Search API

Setting up Piloterr API request :

Once your dork is ready, the next step is automating the search

To complete the request:

  • Paste the Google Dork into the query field
  • Add your API key in the x-api-key header
  • Set request parameters (e.g., use page = 1 to fetch the first page)

Set parameters :

Run request and print result : 

Congratulation, we get our first data : 

The search results with profile link are stored in the key “organic_result”.

💡 Tips : Explore more parameters here : Piloterr Google Search API docs 

Extract Profile Data using Links from the Google Search Result

The Google Search API returns several keys such as pagination, search_parameters, search_information and organic_results.

But we’re only interested in the profile links found under organic_results.

Let’s save the first link in profile_url by accessing results[0]['link']

Then, send that link to Piloterr’s LinkedIn Profile API using the query parameter.

Set request parameters

Make request and print results : 

Run the request and print the result, and that’s it! You now have public data from a real LinkedIn profile :

💡 Tips : Full API reference here : Piloterr LinkedIn API docs

Clone the Project and Run the Full Script

Get the project

Clone repositories https://github.com/harivonyR/LinkedIn_company_employee_scrap

Setup dependencies :

Setup Your API Key

Copy the example credentials file by executing command :

Edit ‘credential.py’ and paste your API key:

Choose Your Target Company and Set Result Limits

Company : 

The default target company is set to Apple Inc. in main.py. You can change it in wish :

Limit Google Results : 

Google Search can return several pages of results. Adjust the search range and link to optimise resources and time :

Run the Pipeline :

This will :

  • Perform a Google Dork search for the specified company
  • Fetch LinkedIn profile details
  • Export them into output/linkedin_profile_dataset.json

Test a Single Profile for debugging (Optional) :

Test() is a special function in main.py designed to test and debug the workflow by executing each part step-by-step.

To run the test, just use:

Now you're ready to automate public LinkedIn employee data extraction with a clean and reusable script.