Monitoring SFTP server updates with Python


Introduction

Managing files on an SFTP server can be a daunting task, especially when you need to monitor updates regularly. While Bash scripting is a powerful tool, it has its limitations, particularly when dealing with more complex tasks like checking for new files and directories on an SFTP server. In this blog post, we'll explore how Python, with its extensive libraries and functionalities, can be leveraged to perform this task efficiently. We'll also walk through the code that connects to an SFTP server, checks for updates, and even performs a Google search for specific file names.

Vien tik su Bash nepavyko tikrinti SFTP serverio failų ir katalogų atnaujinimų, todėl buvo pasitelktas Python. Python turi platesnes galimybes, todėl pridėtos tokios funkcijos kaip prisijungimas prie SFTP serverio, naujų failų tikrinimas ir knygų pavadinimų paieška Google. Python kodas naudojamas prisijungti prie serverio, tikrinti naujus failus ir atlikti Google paieškas, taip suteikiant daugiau funkcionalumo ir patikimumo lyginant su Bash skriptu.

Why Bash Alone Wasn't Enough

Initially, the goal was to create a Bash script to monitor the SFTP server for any new files or directories. However, Bash scripts have certain limitations, particularly in handling complex file operations and network interactions. These limitations made it difficult to efficiently check for updates and perform additional actions like searching for book titles on the internet.

Leveraging Python for Enhanced Functionality

Python offers a wide range of libraries and modules that simplify the task of interacting with SFTP servers, handling files, and making HTTP requests. By using Python, we can take advantage of:

  1. Paramiko: A robust library for SSH2 protocol, allowing us to connect to and interact with SFTP servers easily.
  2. Requests: A simple HTTP library for making requests to web services, such as Google search.
  3. Regex: For parsing and manipulating strings efficiently.

The Code Explained

Let's dive into the Python code that makes all this possible.

  1. Configuration File: We store server credentials and paths in a JSON configuration file for better security and manageability.

  2. Connecting to the SFTP Server: Using the paramiko library, we establish a secure connection to the SFTP server and list the directory contents.

  3. Checking for New Files: We compare the current directory contents with a previously saved state to identify any new files or directories.

  4. Performing Google Search: For any new entries, we strip unnecessary details and perform a Google search to find more information about the book titles.

  5. Avoiding Repetitions: We ensure that no duplicate entries are processed by using a set to keep track of processed titles.

Here is the complete Python code:

python
import paramiko import os import json import re import requests from urllib.parse import quote_plus import time import stat from tqdm import tqdm # Nuskaityti konfigūracijos failą with open('config.json', 'r') as f: config = json.load(f) hostname = config['hostname'] port = config['port'] username = config['username'] password = config['password'] remote_path = config['remote_path'] # Laikino katalogo vieta lokaliame kompiuteryje local_temp_path = '/home/vaidotak/python/ftp_project/temp/' download_path = '/home/vaidotak/Atsiuntimai/' # Google paieškos funkcija def search_google(title): try: search_url = f"https://www.google.com/search?q={quote_plus(title)}" headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36' } response = requests.get(search_url, headers=headers) if response.status_code == 200: return search_url else: print(f"Klaida siunčiant užklausą Google: {response.status_code}") except Exception as e: print(f"Klaida paieškos metu: {e}") return None # Funkcija prisijungimui prie SFTP serverio ir katalogo tikrinimui def check_new_files_and_dirs(): try: # Prisijungimas prie SFTP serverio transport = paramiko.Transport((hostname, port)) transport.connect(username=username, password=password) sftp = paramiko.SFTPClient.from_transport(transport) # Gaunam nuotolinio katalogo sąrašą current_files_and_dirs = sftp.listdir_attr(remote_path) # Tikriname, ar yra ankstesnių duomenų previous_check_file = os.path.join(local_temp_path, 'previous_check.txt') if os.path.exists(previous_check_file): with open(previous_check_file, 'r') as f: previous_data = f.readlines() previous_files_and_dirs = [line.strip() for line in previous_data] else: previous_files_and_dirs = [] # Dabartinio sąrašo suformavimas current_files_and_dirs_names = [entry.filename for entry in current_files_and_dirs] # Nauji failai ir katalogai new_entries = set(current_files_and_dirs_names) - set(previous_files_and_dirs) processed_titles = set() indexed_new_entries = [] if new_entries: print("Yra naujų failų ar katalogų:") for index, entry in enumerate(new_entries, start=1): # Ištraukiame knygos pavadinimą iki skliausto ir pašaliname metus bei "Audiobook" žymą match = re.match(r"^(.*?)(\s*\(.*\))?(\s*Audiobook)?$", entry) if match: book_title = match.group(1).strip() if book_title not in processed_titles: processed_titles.add(book_title) google_link = search_google(book_title) file_mtime = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(current_files_and_dirs[current_files_and_dirs_names.index(entry)].st_mtime)) print(f"{index}. \033[92m{entry}\033[0m ({file_mtime})") if google_link: print(google_link) else: print(f"Nerasta nuoroda Google sistemoje: {book_title}") indexed_new_entries.append((index, entry)) # Pridedame laukimo laiką tarp užklausų time.sleep(1) else: print("Naujų failų ar katalogų nėra.") # Išsaugome dabartinius duomenis būsimam tikrinimui with open(previous_check_file, 'w') as f: for entry in current_files_and_dirs_names: f.write(entry + '\n') # Atsijungimas sftp.close() transport.close() return indexed_new_entries, current_files_and_dirs except Exception as e: print(f"Klaida: {e}") return [], [] # Atsisiuntimo funkcija su progreso juosta def download_files(sftp, files_to_download, download_path): for file_index in files_to_download: # Gauti failo pavadinimą pagal numerį file_entry = indexed_new_entries[file_index - 1][1] remote_file_path = os.path.join(remote_path, file_entry) local_file_path = os.path.join(download_path, file_entry) try: # Patikriname, ar tai yra katalogas file_info = sftp.lstat(remote_file_path) if stat.S_ISDIR(file_info.st_mode): # Sukuriame vietinį katalogą, jei jo nėra if not os.path.exists(local_file_path): os.makedirs(local_file_path) # Atsisiunčiame visus katalogo failus rekursyviai for dirpath, dirnames, filenames in sftp_walk(sftp, remote_file_path): # Sukuriame atitinkamus vietinius katalogus local_dirpath = os.path.join(download_path, os.path.relpath(dirpath, remote_path)) if not os.path.exists(local_dirpath): os.makedirs(local_dirpath) for filename in filenames: remote_file = os.path.join(dirpath, filename) local_file = os.path.join(local_dirpath, filename) file_size = sftp.stat(remote_file).st_size with open(local_file, 'wb') as f, sftp.file(remote_file, 'rb') as remote_file_handle: with tqdm(total=file_size, unit='B', unit_scale=True, desc=filename) as pbar: while True: data = remote_file_handle.read(1024) if not data: break f.write(data) pbar.update(len(data)) else: file_size = file_info.st_size with open(local_file_path, 'wb') as f, sftp.file(remote_file_path, 'rb') as remote_file_handle: with tqdm(total=file_size, unit='B', unit_scale=True, desc=file_entry) as pbar: while True: data = remote_file_handle.read(1024) if not data: break f.write(data) pbar.update(len(data)) print(f"Failas {file_entry} sėkmingai atsisiųstas.") except Exception as e: print(f"Klaida atsisiunčiant failą {file_entry}: {e}") # Papildoma funkcija rekursyviam katalogų tikrinimui ir atsisiuntimui def sftp_walk(sftp, remotepath): path = remotepath files = [] folders = [] for f in sftp.listdir_attr(remotepath): if stat.S_ISDIR(f.st_mode): folders.append(f.filename) else: files.append(f.filename) yield path, folders, files for folder in folders: new_path = os.path.join(remotepath, folder) for x in sftp_walk(sftp, new_path): yield x # Pagrindinė funkcija def main(): global indexed_new_entries indexed_new_entries, current_files_and_dirs = check_new_files_and_dirs() if indexed_new_entries: download_path = '/home/vaidotak/Atsiuntimai/' selection = input("Įveskite failų numerius, kuriuos norite atsisiųsti (pvz.: 1, 4, 18) arba 'visus' atsisiųsti visus: ") if selection.lower() == 'visus': files_to_download = [entry[0] for entry in indexed_new_entries] else: files_to_download = list(map(int, selection.split(','))) # Prisijungimas prie SFTP serverio atsisiuntimui transport = paramiko.Transport((hostname, port)) transport.connect(username=username, password=password) sftp = paramiko.SFTPClient.from_transport(transport) download_files(sftp, files_to_download, download_path) sftp.close() transport.close() if __name__ == "__main__": main()
config.json
json
{ "hostname": "**.***.***.**", "port": 22, "username": "*******", "password": "******", "remote_path": "/***********/********" }
and Bash script
coffeescript
#!/bin/bash # Aktyvuojame virtualią aplinką source /home/vaidotak/python/.venv/bin/activate # Paleidžiame Python skriptą /home/vaidotak/python/.venv/bin/python /home/vaidotak/python/ftp/ftp.py

Conclusion

Using Python, we have created a more robust and flexible solution for monitoring an SFTP server for new files and directories. The added functionality of performing Google searches for specific file names enhances the utility of the script, making it not only a monitoring tool but also a powerful information retriever.

Komentarai

Populiarūs šio tinklaraščio įrašai

Configuring a NixOS firewall for everyday use

Setting up syncthing as a service on openSUSE and other Linux distributions