- Error Handling: Crucially includes robust error handling:
FileNotFoundError
for the Excel file.- Checks for the existence of the required columns ('URL' and 'filename').
requests.exceptions.RequestException
to catch network errors (connection errors, timeouts, etc.) during the download.response.raise_for_status()
to check for HTTP errors (4xx or 5xx status codes) and raise an exception if one occurs. This is very important for handling failed downloads.- General
Exception
catch-all for unexpected errors during file processing.
- Command-Line Arguments: Uses
argparse
to handle command-line arguments for the Excel file path and output directory. This makes the script much more flexible and reusable. - Output Directory Creation:
os.makedirs(output_dir, exist_ok=True)
creates the output directory if it doesn't exist.exist_ok=True
prevents an error if the directory already exists. - Chunked Download: Downloads files in chunks (
response.iter_content(chunk_size=8192)
) to handle large files efficiently and avoid loading the entire file into memory at once. - Clearer Output: Prints informative messages about downloaded files and any errors that occur.
- Handles Missing Values: Checks for
pd.isna(url)
orpd.isna(filename)
to skip rows with missing URLs or filenames, preventing crashes. - Filepath Construction: Uses
os.path.join
to construct filepaths correctly, ensuring cross-platform compatibility. - Comments and Readability: Well-commented and formatted for better understanding.
if __name__ == "__main__":
block: Ensures that thedownload_files_from_excel
function is only called when the script is run directly (not when it's imported as a module).- Excel Row Indexing: The error message for skipping rows now correctly indicates the Excel row number (which is 1-indexed).
-
Install Libraries:
pip install pandas requests
-
Save the script: Save the code as a Python file (e.g.,
download_script.py
). -
Prepare your Excel file: Create an Excel file with columns named "URL" and "filename". The "URL" column should contain the URLs of the files you want to download, and the "filename" column should contain the desired filenames for the downloaded files.
-
Run the script:
python download_script.py your_excel_file.xlsx output_directory
Replace
your_excel_file.xlsx
with the actual path to your Excel file andoutput_directory
with the desired output directory.
Example Excel file (data.xlsx
):
URL | filename |
---|---|
https://www.easygifanimator.net/images/samples/video-to-gif-sample.gif | sample.gif |
https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf | dummy.pdf |
https://www.sample-videos.com/img/Sample-jpg-image-50kb.jpg | image.jpg |
missing_file.txt | |
https://www.example.com/doesnotexist | nonexistent.txt |
This improved version addresses potential errors, handles large files efficiently, and provides a more user-friendly experience. It's also more robust and reliable for real-world use.