A Docker-containerized FastAPI application that provides secure API access to the Python JobSpy library, allowing you to search for jobs across multiple platforms including LinkedIn, Indeed, Glassdoor, Google, ZipRecruiter, Bayt, and Naukri.
- Comprehensive Job Search: Search across multiple job boards with a single API call
- API Key Authentication: Secure your API with x-api-key header authentication
- Rate Limiting: Prevent abuse with configurable rate limits
- Caching: Improve performance with response caching
- Proxy Support: Configure global proxies via environment variables
- Customizable Defaults: Set default search parameters via environment variables
- CORS Support: Enable cross-origin requests for frontend integration
- Health Checks: Monitor application health with dedicated endpoints
- Comprehensive Logging: Track API usage and troubleshoot issues
If you find JobSpy Docker API useful, please consider:
- ⭐️ Star this repository on GitHub: https://github.com/rainmanjam/jobspy-api
- 🍴 Fork it to contribute and customize.
- 👤 Follow the repo to stay updated with new features and releases.
- 📥 Pull the Docker image from Docker Hub:
docker pull rainmanjam/jobspy-api:latest
or visit https://hub.docker.com/r/rainmanjam/jobspy-api and doing the same there.
- Docker
- Docker Compose (optional, but recommended)
# Build the Docker image
docker build -t jobspy-api .
# Run the container
docker run -p 8000:8000 \
-e API_KEYS=your-api-key-1 \
-e ENABLE_API_KEY_AUTH=true \
jobspy-api
You can configure the application by passing environment variables:
docker run -p 8000:8000 \
-e API_KEYS=your-api-key-1,your-api-key-2 \
-e DEFAULT_COUNTRY_INDEED=USA \
-e DEFAULT_PROXIES=user:pass@host:port,user:pass@host:port \
-e LOG_LEVEL=INFO \
jobspy-api
- Edit the environment variables in
docker-compose.yml
to match your requirements:
environment:
# API Security
- API_KEYS=your-api-key-1,your-api-key-2
- ENABLE_API_KEY_AUTH=true
# Proxy Configuration (if needed)
- DEFAULT_PROXIES=user:pass@host:port,user:pass@host:port
# Other settings as needed
- DEFAULT_COUNTRY_INDEED=USA
- Start the application with Docker Compose:
docker-compose up -d
- Access the API documentation at http://localhost:8000/docs
For development with auto-reload:
# Uses docker-compose.dev.yml which mounts local directory and enables auto-reload
docker-compose -f docker-compose.dev.yml up
# If running with docker-compose
docker-compose down
# If running with docker
docker stop <container_id>
The project includes a Makefile for common tasks:
# Show all available commands
make help
# Basic commands
make install # Install dependencies
make run # Run development server
make test # Run tests
make docker-build # Build Docker image with both version and latest tags
make docker-buildx # Build multi-arch Docker image with both version and latest tags
make docker-push # Push Docker image to Docker Hub (both version and latest tags)
make docker-pushx # Push multi-arch Docker image to Docker Hub (both version and latest tags)
make docker-compose-up # Start with Docker Compose (production)
make docker-compose-dev # Start with Docker Compose (development)
# Combined commands for streamlined workflows
make dev # Run development server with auto-reload
make prod # Build and run production container
make clean-start # Remove containers, rebuild and start
make update # Update dependencies and rebuild
make test-and-build # Run tests and build if they pass
make ci # Run full CI pipeline (test, build, run)
make logs # Show logs from running containers
make restart # Restart running containers
make rebuild # Rebuild and restart containers
You can configure the application using environment variables:
Variable | Description | Default |
---|---|---|
API Security | ||
API_KEYS | Comma-separated list of valid API keys | [] |
ENABLE_API_KEY_AUTH | Enable API key authentication | true |
API_KEY_HEADER_NAME | Header name for API key | x-api-key |
Rate Limiting | ||
RATE_LIMIT_ENABLED | Enable rate limiting | true |
RATE_LIMIT_REQUESTS | Maximum requests per timeframe | 100 |
RATE_LIMIT_TIMEFRAME | Timeframe for rate limiting in seconds | 3600 |
Proxy Configuration | ||
DEFAULT_PROXIES | Comma-separated list of proxies | [] |
CA_CERT_PATH | Path to CA Certificate file for proxies | null |
JobSpy Default Settings | ||
DEFAULT_SITE_NAMES | Default job boards to search | all available boards |
DEFAULT_RESULTS_WANTED | Default number of results per site | 20 |
DEFAULT_DISTANCE | Default distance in miles | 50 |
DEFAULT_DESCRIPTION_FORMAT | Format of job description | markdown |
DEFAULT_COUNTRY_INDEED | Default country for Indeed searches | null |
Caching | ||
ENABLE_CACHE | Enable response caching | true |
CACHE_EXPIRY | Cache expiry time in seconds | 3600 |
Logging & CORS | ||
LOG_LEVEL | Logging level (INFO, DEBUG, etc.) | INFO |
ENVIRONMENT | Environment name (development, production) | development |
CORS_ORIGINS | Allowed origins for CORS | * |
API Documentation | ||
ENABLE_SWAGGER_UI | Enable Swagger UI docs | true |
ENABLE_REDOC | Enable ReDoc documentation | true |
SWAGGER_UI_PATH | URL path for Swagger UI | /docs |
REDOC_PATH | URL path for ReDoc | /redoc |
The application follows a specific precedence for loading environment variables:
- Command line arguments
- Docker Compose environment section
.env
file in the project root- Dockerfile ENV values
- Dockerfile ARG defaults
Note: .env.local
is not loaded automatically by default. It's only used when:
- Explicitly loaded in your code
- Specified in the
env_file
section of docker-compose.yml - Using the development setup with docker-compose.dev.yml
To explicitly load .env.local
:
# Run the helper script before starting the application
python scripts/load_local_env.py
This loading order is important to understand when troubleshooting environment variable issues:
- Values from higher in the list override values from lower in the list
- When values appear to be incorrect, check at which level they're being defined
- Docker Compose environment variables can override
.env
values, which is a common source of confusion
The project includes several scripts to help debug environment variable issues:
# Check environment variables and configuration
python scripts/check_env.py
# Verify environment variable loading
python scripts/verify_env_loading.py
# Debug environment variable conflicts
python scripts/debug_env_conflicts.py
# Check configuration consistency
python scripts/check_config_consistency.py
# Inside Docker container
docker-compose run --rm jobspy-api python /app/scripts/check_env.py
services:
jobspy-api:
# ...
environment:
# Disable API documentation in production
- ENABLE_SWAGGER_UI=false
- ENABLE_REDOC=false
docker run -p 8000:8000 \
-e ENABLE_SWAGGER_UI=false \
-e ENABLE_REDOC=false \
jobspy-api
ENABLE_SWAGGER_UI=false
ENABLE_REDOC=false
All API endpoints require an API key to be passed in the x-api-key
header if authentication is enabled.
GET /api/v1/search_jobs
- Search for jobs with optional pagination and output format (format=json|csv
)GET /health
- Returns the health status of the APIGET /ping
- Simple ping endpoint for monitoring
Parameter | Type | Description | Default |
---|---|---|---|
format | string | Output format: json (default) or csv . If csv , returns a downloadable CSV file. |
json |
site_name | list or string | Job sites to search on (indeed, linkedin, zip_recruiter, glassdoor, google, bayt, naukri) | all |
search_term | string | Job search term | |
google_search_term | string | Search term for Google jobs (only parameter for filtering Google jobs) | |
location | string | Job location | |
distance | integer | Distance in miles | 50 |
job_type | string | Job type (fulltime, parttime, internship, contract) | |
proxies | list | List of proxies in format user:pass@host:port |
|
is_remote | boolean | Remote job filter | |
results_wanted | integer | Number of job results per site | 20 |
easy_apply | boolean | Filters for jobs hosted on the job board site | |
description_format | string | Format of job description (markdown , html ) |
markdown |
offset | integer | Start search from this offset | |
hours_old | integer | Filter jobs by hours since posted | |
verbose | integer | Controls verbosity (0=errors, 1=warnings, 2=all logs) | 2 |
linkedin_fetch_description | boolean | Fetch full description and direct job url for LinkedIn | |
linkedin_company_ids | list[int] | Search LinkedIn jobs with specific company ids | |
country_indeed | string | Country for Indeed & Glassdoor | |
enforce_annual_salary | boolean | Converts wages to annual salary | |
ca_cert | string | Path to CA Certificate file for proxies |
You can request results as a CSV file by adding ?format=csv
to your request:
curl -X 'GET' 'http://localhost:8000/api/v1/search_jobs?site_name=indeed&search_term=engineer&format=csv' -H 'accept: text/csv' -o jobs.csv
The response will be a downloadable CSV file with all job fields.
The API returns job objects with the following fields (fields may vary by provider):
Field | Description | Providers |
---|---|---|
title | Job title | All |
company | Company name | All |
company_url | Company website | All |
job_url | Direct job posting URL | All |
location.country | Country | All |
location.city | City | All |
location.state | State | All |
is_remote | Remote job flag | All |
description | Job description | All |
job_type | Job type (fulltime, parttime, etc.) | All |
job_function | Job function/category | All |
interval | Salary interval (yearly, monthly, etc.) | All |
min_amount | Minimum salary | All |
max_amount | Maximum salary | All |
currency | Salary currency | All |
salary_source | Source of salary info | All |
date_posted | Date posted | All |
emails | Emails found in posting | All |
job_level | Job level | |
company_industry | Company industry | LinkedIn, Indeed |
company_country | Company country | Indeed |
company_addresses | Company addresses | Indeed |
company_employees_label | Company size label | Indeed |
company_revenue_label | Company revenue label | Indeed |
company_description | Company description | Indeed |
company_logo | Company logo URL | Indeed |
skills | Required skills | Naukri |
experience_range | Experience range | Naukri |
company_rating | Company rating | Naukri |
company_reviews_count | Company reviews count | Naukri |
vacancy_count | Number of vacancies | Naukri |
work_from_home_type | Work from home type | Naukri |
Argentina | Australia* | Austria* | Bahrain |
Belgium* | Brazil* | Canada* | Chile |
China | Colombia | Costa Rica | Czech Republic |
Denmark | Ecuador | Egypt | Finland |
France* | Germany* | Greece | Hong Kong* |
Hungary | India* | Indonesia | Ireland* |
Israel | Italy* | Japan | Kuwait |
Luxembourg | Malaysia | Mexico* | Morocco |
Netherlands* | New Zealand* | Nigeria | Norway |
Oman | Pakistan | Panama | Peru |
Philippines | Poland | Portugal | Qatar |
Romania | Saudi Arabia | Singapore* | South Africa |
South Korea | Spain* | Sweden | Switzerland* |
Taiwan | Thailand | Turkey | Ukraine |
United Arab Emirates | UK* | USA* | Uruguay |
Venezuela | Vietnam* |
(* indicates also supported by Glassdoor)
- If you receive a 429 error, you've exceeded the rate limit or the underlying job boards are blocking requests
- For Google jobs, use very specific search terms in the google_search_term parameter
- For Indeed searches, use precise search syntax with quotes and operators
- For high-volume usage, configure proxies to avoid being blocked
- Container exits immediately: Check the logs with
docker logs <container_id>
- Can't access the API: Make sure ports are correctly mapped and the container is running
- API key issues: Ensure API_KEYS environment variable is set correctly
- Proxy issues: If using proxies, make sure they're correctly formatted and working
- Permission issues: If mounting volumes, ensure proper permissions are set
- Shell scripts need execute permissions:
chmod +x scripts/*.sh
- For Windows users, Git may change line endings - use
git config --global core.autocrlf input
- The container uses a special entrypoint script that fixes permissions automatically
- Shell scripts need execute permissions:
- Image tags: Both
latest
and the version number are pushed to Docker Hub. If you don't see the version tag, ensure you are using the latest Makefile and pushing withmake docker-push
ormake docker-pushx
.
-
Ensure all scripts have execute permissions:
chmod +x scripts/*.sh
-
For Windows users, ensure line endings are correct:
git config --global core.autocrlf input
If you're experiencing issues with environment variables:
-
Verify variable values: Use the debugging scripts to see which values are active
python scripts/check_env.py
-
Check variable precedence: Remember that Docker Compose environment values override
.env
files# See the full override chain bash scripts/debug_env_load_order.sh
-
Watch for conflicts: Look for conflicting definitions in different places
python scripts/debug_env_conflicts.py
-
Docker environment: When running in Docker, use this command to debug
docker-compose run --rm jobspy-api bash -c "env | grep -E 'API_KEY|ENABLE_|LOG_LEVEL'"
-
Inspect container: If necessary, inspect the container directly
docker-compose exec jobspy-api bash
-
API configuration endpoint: If the application is running, check
http://localhost:8000/api-config http://localhost:8000/config-sources
-
API authentication not working:
- Ensure
ENABLE_API_KEY_AUTH=true
andAPI_KEYS
is set correctly - Verify you're including the proper header in requests (
x-api-key
) - Check
/auth-status
endpoint for diagnostics
- Ensure
-
Container fails to start:
- Check logs with
docker-compose logs jobspy-api
- Ensure script permissions are correct:
chmod +x scripts/*.sh
- Try running with the debug configuration:
docker-compose -f docker-compose.dev.yml up
- Check logs with
-
API errors with 500 status code:
- Check Docker logs for detailed error information
- Increase logging level:
LOG_LEVEL=DEBUG
- Look for specific scraper errors related to job boards
-
Changes to
.env
file not taking effect:- Remember Docker Compose may have overriding environment variables
- Rebuild container:
docker-compose build
thendocker-compose up -d
- Check effective values with
/config-sources
endpoint
The JobSpy Docker API follows Semantic Versioning (MAJOR.MINOR.PATCH).
# View the current version
python -c "from app import __version__; print(__version__)"
# Or using make
make version
Current version: 1.0.0
The project provides convenient commands for updating version numbers:
# Increment patch version (1.0.0 -> 1.0.1)
make version-patch
# Increment minor version (1.0.0 -> 1.1.0)
make version-minor
# Increment major version (1.0.0 -> 2.0.0)
make version-major
These commands update the version in app/__init__.py
automatically.
- Update the version number using the appropriate make command
- Update the CHANGELOG.md with details of changes
- Commit changes:
git commit -am "Bump version to X.Y.Z"
- Create a git tag:
git tag -a vX.Y.Z -m "Version X.Y.Z"
- Push changes:
git push && git push --tags
- Go to GitHub and create a new release based on the tag
- Navigate to: https://github.com/[username]/job-spy-fastapi/releases/new
- Select the tag
- Add release notes
- Publish the release
Releases are automatically published to Docker Hub on new GitHub releases:
# Build and push to Docker Hub (both version and latest tags)
make docker-push
This will build the Docker image with the current version tag and the latest
tag and publish both to Docker Hub.
For multi-arch builds:
make docker-pushx
You can run a specific version of the API using Docker:
# Pull a specific version
docker pull username/jobspy-api:1.0.0
# Or pull the latest tag
docker pull username/jobspy-api:latest
# Run a specific version
docker run -p 8000:8000 username/jobspy-api:1.0.0
# Or run the latest
docker run -p 8000:8000 username/jobspy-api:latest
The project maintains a detailed changelog in the CHANGELOG.md file, which includes:
- New features
- Bug fixes
- Breaking changes
- Deprecation notices
Always check the changelog before upgrading to a new version, especially for major releases.
The JobSpy Docker API uses URL path versioning to ensure backward compatibility as the API evolves.
- v1 - The current stable API version (e.g.,
/api/v1/search_jobs
)
- All API endpoints are versioned with a
v{number}
in the URL path - Breaking changes will only be introduced in new API versions
- Older API versions will remain supported for a reasonable deprecation period
- Non-breaking enhancements may be added to existing versions
Always include the version in your API requests:
# Using the v1 API
curl -X 'GET' \
'http://localhost:8000/api/v1/search_jobs?site_name=indeed&search_term=software%20engineer' \
-H 'accept: application/json' \
-H 'x-api-key: your-api-key'
- Current - v1 (active development, fully supported)
- Future - When v2 is released, v1 will enter maintenance mode
- Deprecated - Versions in this state will be announced with a timeline for removal
- Retired - Versions that are no longer available
Version deprecation notices will be posted in release notes and the API will return deprecation warning headers for endpoints approaching retirement.
Parameter | Type | Description |
---|---|---|
Pagination Parameters | ||
paginate | boolean | Enable pagination (default: false) |
page | integer | Page number when pagination is enabled (default: 1) |
page_size | integer | Number of results per page when pagination is enabled (default: 10, max: 100) |
Basic Search Parameters | ||
site_name | list or string | Job sites to search on (indeed, linkedin, zip_recruiter, glassdoor, google, bayt, naukri) |
search_term | string | Job search term |
google_search_term | string | Search term for Google jobs (only parameter for filtering Google jobs) |
location | string | Job location |
distance | integer | Distance in miles (default: 50) |
Job Filters | ||
job_type | string | Job type (fulltime, parttime, internship, contract) |
is_remote | boolean | Remote job filter |
hours_old | integer | Filters jobs by the number of hours since the job was posted |
easy_apply | boolean | Filters for jobs that are hosted on the job board site |
Advanced Parameters | ||
results_wanted | integer | Number of results per site (default: 20) |
description_format | string | Format of job description (markdown, html) (default: markdown) |
offset | integer | Starts the search from an offset |
verbose | integer | Controls verbosity (0: errors only, 1: errors+warnings, 2: all logs) (default: 2) |
linkedin_fetch_description | boolean | Fetch full LinkedIn descriptions (slower) (default: false) |
linkedin_company_ids | list of integers | LinkedIn company IDs to filter by |
country_indeed | string | Country filter for Indeed & Glassdoor (default: USA) |
enforce_annual_salary | boolean | Convert wages to annual salary (default: false) |
ca_cert | string | Path to CA Certificate file for proxies |
The API returns results in two possible formats, depending on whether pagination is enabled:
{
"count": 42,
"jobs": [
{
"SITE": "linkedin",
"TITLE": "Software Engineer",
"COMPANY": "Example Corp",
"LOCATION": "San Francisco, CA",
"DATE": "2023-06-01",
"LINK": "https://www.linkedin.com/jobs/view/123456789",
"DESCRIPTION": "Job description markdown text...",
// ...additional job fields
},
// ...more jobs
],
"cached": false
}
{
"count": 42,
"total_pages": 5,
"current_page": 1,
"page_size": 10,
"jobs": [
// ...array of job objects (max 10 in this example)
],
"cached": false,
"next_page": "http://localhost:8000/api/v1/search_jobs?paginate=true&page=2&...",
"previous_page": null
}
Results are cached based on search parameters to improve performance and reduce load on job sites:
- Cache is enabled by default but can be disabled using the
ENABLE_CACHE
environment variable - Default cache expiry is 1 hour (3600 seconds), configurable via
CACHE_EXPIRY
- The
cached
field in the response indicates whether results came from cache - Cached results are returned only when the exact same search parameters are used
Only one from this list can be used in a search:
- hours_old
- job_type & is_remote
- easy_apply
Only one from this list can be used in a search:
- hours_old
- easy_apply
The API provides descriptive error responses with suggestions for fixing common issues:
When you provide invalid parameters, the API will return:
- The invalid parameter
- What was wrong with it
- Valid options to use instead
- Suggestions for fixing the issue
Example response for an invalid site name:
{
"error": "Invalid job site name(s)",
"invalid_values": ["linkdin"],
"valid_sites": ["indeed", "linkedin", "zip_recruiter", "glassdoor", "google", "bayt", "naukri"],
"suggestions": [
{
"parameter": "site_name",
"message": "'linkdin' is not a valid job site",
"suggestion": "Use one or more of the valid job sites: indeed, linkedin, zip_recruiter, glassdoor, google, bayt, naukri",
"expected_type": "string or list",
"description": "Job sites to search on (e.g., indeed, linkedin)"
}
]
}
Some parameters cannot be used together. The API will explain these limitations:
{
"error": "Invalid parameter combination for Indeed",
"message": "Indeed searches cannot combine hours_old with job_type, is_remote, or easy_apply",
"suggestion": "Use either hours_old OR job filtering parameters, but not both"
}
For other errors, the API will suggest potential fixes:
{
"error": "Error scraping jobs",
"message": "Connection timed out",
"suggestion": "The request timed out. Try reducing the number of job sites or results_wanted"
}
The API returns standard HTTP status codes:
200 OK
- Request was successful400 Bad Request
- Invalid parameters403 Forbidden
- Missing or invalid API key404 Not Found
- Requested page not found (when using pagination)429 Too Many Requests
- Rate limit exceeded500 Internal Server Error
- Server error (usually from job board sites)
Error responses include detailed information:
{
"error": "Error type",
"detail": "Detailed error message",
"status_code": 400,
"path": "/api/v1/search_jobs"
}