WebTools
The WebTools
class provides a comprehensive set of static methods for interacting with various web services and APIs. These powerful tools enable developers to perform web searches, scrape websites, query academic databases, and retrieve weather information with ease and efficiency. By leveraging these methods, you can enhance your applications with rich, real-time data from across the internet.
Class Methods Overview
exa_search
: Perform a web search using the Exa Search APIscrape_urls
: Scrape content from a list of URLsserper_search
: Perform a web search using the Serper APIscrape_url_with_serper
: Scrape content from a URL using the Serper APIquery_arxiv_api
: Query the arXiv API for academic papersget_weather_data
: Retrieve weather data for a specified location
Let's dive into each method in detail, exploring their functionality, parameters, return values, and practical examples.
exa_search
Method Signature
@staticmethod
def exa_search(queries: Union[str, List[str]], num_results: int = 10, search_type: str = "neural", num_sentences: int = 3, highlights_per_url: int = 3) -> dict:
Parameters
queries
(Union[str, List[str]]): A single query string or a list of query strings to search for.num_results
(int, optional): The number of search results to return per query. Default is 10.search_type
(str, optional): The type of search to perform. Can be "neural" or "web". Default is "neural".num_sentences
(int, optional): The number of sentences to include in the highlights for each result. Default is 3.highlights_per_url
(int, optional): The number of highlights to include per URL. Default is 3.
Return Value
dict
: A structured dictionary containing the search results for each query.
Description
The exa_search
method allows you to perform web searches using the Exa Search API. It takes one or more search queries and returns a structured dictionary containing the search results for each query.
The method first checks if the EXA_API_KEY
environment variable is set. If not, it raises a ValueError
. It then constructs the API endpoint URL and headers, including the API key.
For each query, the method constructs a payload dictionary containing the search parameters, such as the query string, search type, number of results, and highlight settings. It then sends a POST request to the API endpoint with the payload.
If the request is successful, the method restructures the response data into a more user-friendly format. It extracts the relevant information, such as the title, URL, author, and highlights for each search result, and appends it to the structured_data
dictionary.
If an error occurs during the request or JSON decoding, the method prints an error message and retries the request up to 3 times with a 1-second delay between attempts. If all attempts fail, it adds an empty result for that query to the structured_data
dictionary.
Finally, the method returns the structured_data
dictionary containing the search results for each query.
Example Usage
queries = ["Python programming", "Machine learning algorithms"]
results = WebTools.exa_search(queries, num_results=5, search_type="neural", num_sentences=2, highlights_per_url=2)
print(results)
scrape_urls
Method Signature
@staticmethod
def scrape_urls(urls: Union[str, List[str]], include_html: bool = False, include_links: bool = False) -> List[Dict[str, Union[str, List[str]]]]:
Parameters
urls
(Union[str, List[str]]): A single URL string or a list of URL strings to scrape.include_html
(bool, optional): Whether to include the HTML content of the scraped pages in the results. Default is False.include_links
(bool, optional): Whether to include the links found on the scraped pages in the results. Default is False.
Return Value
List[Dict[str, Union[str, List[str]]]]
: A list of dictionaries, where each dictionary represents a scraped URL and contains the URL, content, and optionally, the HTML content and links.
Description
The scrape_urls
method allows you to scrape content from one or more URLs. It takes a single URL string or a list of URL strings and returns a list of dictionaries containing the scraped data for each URL.
For each URL, the method sends a GET request with a random user agent header to avoid being blocked by websites. It adds a small delay between requests to avoid overwhelming the server.
If the request is successful, the method parses the HTML content using BeautifulSoup. It removes any script, style, and SVG tags from the parsed content.
The method then constructs a dictionary for each URL, containing the URL and the scraped content. If include_html
is set to True, it includes the HTML content of the page in the dictionary. If include_links
is set to True, it includes the links found on the page in the dictionary.
If an error occurs during the request or content processing, the method appends a dictionary to the results list with the URL and an error message describing the encountered issue.
Finally, the method returns the list of dictionaries containing the scraped data for each URL.
Example Usage
urls = ["https://www.example.com", "https://www.example.org"]
scraped_data = WebTools.scrape_urls(urls, include_html=True, include_links=True)
print(scraped_data)
get_weather_data
Method Signature
@staticmethod
def get_weather_data(
location: str,
forecast_days: Optional[int] = None,
include_current: bool = True,
include_forecast: bool = True,
include_astro: bool = False,
include_hourly: bool = False,
include_alerts: bool = False,
) -> str:
Parameters
location
(str): The location to get the weather report for.forecast_days
(Optional[int], optional): Number of days for the forecast (1-10). If provided, forecast is included. Default is None.include_current
(bool, optional): Whether to include current conditions in the output. Default is True.include_forecast
(bool, optional): Whether to include forecast data in the output. Default is True.include_astro
(bool, optional): Whether to include astronomical data in the output. Default is False.include_hourly
(bool, optional): Whether to include hourly forecast data in the output. Default is False.include_alerts
(bool, optional): Whether to include weather alerts in the output. Default is False.
Return Value
str
: Formatted string with the requested weather data.
Raises
ValueError
: If the API key is not set, days are out of range, or if the API returns an error.requests.RequestException
: If there's an error with the API request.
Description
The get_weather_data
method allows you to retrieve a detailed weather report for a specified location using the WeatherAPI service. It takes the location as a required parameter and several optional parameters to customize the output.
The method first checks if the WEATHER_API_KEY
environment variable is set. If not, it raises a ValueError
. It then constructs the API endpoint URL and parameters, including the API key, location, and other options based on the provided parameters.
If forecast_days
is provided, it must be between 1 and 10. If include_forecast
is True but forecast_days
is not set, it defaults to 1 day.
The method sends a GET request to the WeatherAPI endpoint with the constructed parameters. If the request is successful, it retrieves the JSON response data. If the API returns an error, it raises a ValueError
with the error message.
The method then formats the weather data into a readable string report. It includes the location details, current conditions (if include_current
is True), forecast data (if include_forecast
is True), astronomical data (if include_astro
is True), hourly forecast data (if include_hourly
is True), and weather alerts (if include_alerts
is True).
Finally, the method returns the formatted weather report as a string.
Example Usage
location = "New York"
weather_report = WebTools.get_weather_data(location, forecast_days=3, include_current=True, include_forecast=True, include_astro=True, include_hourly=True, include_alerts=True)
print(weather_report)
These are just a few examples of the powerful methods available in the WebTools
class. By leveraging these tools, you can easily integrate web search, web scraping, academic paper querying, and weather data retrieval into your applications, enabling you to build feature-rich and data-driven solutions.