Monitoring SFTP server updates with Python


Introduction

Managing files on an SFTP server can be a daunting task, especially when you need to monitor updates regularly. While Bash scripting is a powerful tool, it has its limitations, particularly when dealing with more complex tasks like checking for new files and directories on an SFTP server. In this blog post, we'll explore how Python, with its extensive libraries and functionalities, can be leveraged to perform this task efficiently. We'll also walk through the code that connects to an SFTP server, checks for updates, and even performs a Google search for specific file names.

Vien tik su Bash nepavyko tikrinti SFTP serverio failų ir katalogų atnaujinimų, todėl buvo pasitelktas Python. Python turi platesnes galimybes, todėl pridėtos tokios funkcijos kaip prisijungimas prie SFTP serverio, naujų failų tikrinimas ir knygų pavadinimų paieška Google. Python kodas naudojamas prisijungti prie serverio, tikrinti naujus failus ir atlikti Google paieškas, taip suteikiant daugiau funkcionalumo ir patikimumo lyginant su Bash skriptu.

Why Bash Alone Wasn't Enough

Initially, the goal was to create a Bash script to monitor the SFTP server for any new files or directories. However, Bash scripts have certain limitations, particularly in handling complex file operations and network interactions. These limitations made it difficult to efficiently check for updates and perform additional actions like searching for book titles on the internet.

Leveraging Python for Enhanced Functionality

Python offers a wide range of libraries and modules that simplify the task of interacting with SFTP servers, handling files, and making HTTP requests. By using Python, we can take advantage of:

  1. Paramiko: A robust library for SSH2 protocol, allowing us to connect to and interact with SFTP servers easily.
  2. Requests: A simple HTTP library for making requests to web services, such as Google search.
  3. Regex: For parsing and manipulating strings efficiently.

The Code Explained

Let's dive into the Python code that makes all this possible.

  1. Configuration File: We store server credentials and paths in a JSON configuration file for better security and manageability.

  2. Connecting to the SFTP Server: Using the paramiko library, we establish a secure connection to the SFTP server and list the directory contents.

  3. Checking for New Files: We compare the current directory contents with a previously saved state to identify any new files or directories.

  4. Performing Google Search: For any new entries, we strip unnecessary details and perform a Google search to find more information about the book titles.

  5. Avoiding Repetitions: We ensure that no duplicate entries are processed by using a set to keep track of processed titles.

Here is the complete Python code:

 
import paramiko
import os
import json
import re
import requests
from urllib.parse import quote_plus
import time
import stat
from tqdm import tqdm

# Nuskaityti konfigūracijos failą
with open('config.json', 'r') as f:
    config = json.load(f)

hostname = config['hostname']
port = config['port']
username = config['username']
password = config['password']
remote_path = config['remote_path']

# Laikino katalogo vieta lokaliame kompiuteryje
local_temp_path = '/home/vaidotak/python/ftp_project/temp/'
download_path = '/home/vaidotak/Atsiuntimai/'

# Google paieškos funkcija
def search_google(title):
    try:
        search_url = f"https://www.google.com/search?q={quote_plus(title)}"
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
        }
        response = requests.get(search_url, headers=headers)
        if response.status_code == 200:
            return search_url
        else:
            print(f"Klaida siunčiant užklausą Google: {response.status_code}")
    except Exception as e:
        print(f"Klaida paieškos metu: {e}")
    return None

# Funkcija prisijungimui prie SFTP serverio ir katalogo tikrinimui
def check_new_files_and_dirs():
    try:
        # Prisijungimas prie SFTP serverio
        transport = paramiko.Transport((hostname, port))
        transport.connect(username=username, password=password)
        sftp = paramiko.SFTPClient.from_transport(transport)
        
        # Gaunam nuotolinio katalogo sąrašą
        current_files_and_dirs = sftp.listdir_attr(remote_path)
        
        # Tikriname, ar yra ankstesnių duomenų
        previous_check_file = os.path.join(local_temp_path, 'previous_check.txt')
        if os.path.exists(previous_check_file):
            with open(previous_check_file, 'r') as f:
                previous_data = f.readlines()
            previous_files_and_dirs = [line.strip() for line in previous_data]
        else:
            previous_files_and_dirs = []
        
        # Dabartinio sąrašo suformavimas
        current_files_and_dirs_names = [entry.filename for entry in current_files_and_dirs]
        
        # Nauji failai ir katalogai
        new_entries = set(current_files_and_dirs_names) - set(previous_files_and_dirs)
        
        processed_titles = set()
        indexed_new_entries = []

        if new_entries:
            print("Yra naujų failų ar katalogų:")
            for index, entry in enumerate(new_entries, start=1):
                # Ištraukiame knygos pavadinimą iki skliausto ir pašaliname metus bei "Audiobook" žymą
                match = re.match(r"^(.*?)(\s*\(.*\))?(\s*Audiobook)?$", entry)
                if match:
                    book_title = match.group(1).strip()
                    if book_title not in processed_titles:
                        processed_titles.add(book_title)
                        google_link = search_google(book_title)
                        file_mtime = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(current_files_and_dirs[current_files_and_dirs_names.index(entry)].st_mtime))
                        print(f"{index}. \033[92m{entry}\033[0m ({file_mtime})")
                        if google_link:
                            print(google_link)
                        else:
                            print(f"Nerasta nuoroda Google sistemoje: {book_title}")
                        indexed_new_entries.append((index, entry))
                
                # Pridedame laukimo laiką tarp užklausų
                time.sleep(1)
        else:
            print("Naujų failų ar katalogų nėra.")
        
        # Išsaugome dabartinius duomenis būsimam tikrinimui
        with open(previous_check_file, 'w') as f:
            for entry in current_files_and_dirs_names:
                f.write(entry + '\n')
        
        # Atsijungimas
        sftp.close()
        transport.close()

        return indexed_new_entries, current_files_and_dirs
    except Exception as e:
        print(f"Klaida: {e}")
        return [], []

# Atsisiuntimo funkcija su progreso juosta
def download_files(sftp, files_to_download, download_path):
    for file_index in files_to_download:
        # Gauti failo pavadinimą pagal numerį
        file_entry = indexed_new_entries[file_index - 1][1]
        remote_file_path = os.path.join(remote_path, file_entry)
        local_file_path = os.path.join(download_path, file_entry)
        
        try:
            # Patikriname, ar tai yra katalogas
            file_info = sftp.lstat(remote_file_path)
            if stat.S_ISDIR(file_info.st_mode):
                # Sukuriame vietinį katalogą, jei jo nėra
                if not os.path.exists(local_file_path):
                    os.makedirs(local_file_path)
                
                # Atsisiunčiame visus katalogo failus rekursyviai
                for dirpath, dirnames, filenames in sftp_walk(sftp, remote_file_path):
                    # Sukuriame atitinkamus vietinius katalogus
                    local_dirpath = os.path.join(download_path, os.path.relpath(dirpath, remote_path))
                    if not os.path.exists(local_dirpath):
                        os.makedirs(local_dirpath)
                    
                    for filename in filenames:
                        remote_file = os.path.join(dirpath, filename)
                        local_file = os.path.join(local_dirpath, filename)
                        file_size = sftp.stat(remote_file).st_size
                        with open(local_file, 'wb') as f, sftp.file(remote_file, 'rb') as remote_file_handle:
                            with tqdm(total=file_size, unit='B', unit_scale=True, desc=filename) as pbar:
                                while True:
                                    data = remote_file_handle.read(1024)
                                    if not data:
                                        break
                                    f.write(data)
                                    pbar.update(len(data))
            else:
                file_size = file_info.st_size
                with open(local_file_path, 'wb') as f, sftp.file(remote_file_path, 'rb') as remote_file_handle:
                    with tqdm(total=file_size, unit='B', unit_scale=True, desc=file_entry) as pbar:
                        while True:
                            data = remote_file_handle.read(1024)
                            if not data:
                                break
                            f.write(data)
                            pbar.update(len(data))
            print(f"Failas {file_entry} sėkmingai atsisiųstas.")
        except Exception as e:
            print(f"Klaida atsisiunčiant failą {file_entry}: {e}")

# Papildoma funkcija rekursyviam katalogų tikrinimui ir atsisiuntimui
def sftp_walk(sftp, remotepath):
    path = remotepath
    files = []
    folders = []
    for f in sftp.listdir_attr(remotepath):
        if stat.S_ISDIR(f.st_mode):
            folders.append(f.filename)
        else:
            files.append(f.filename)
    yield path, folders, files
    for folder in folders:
        new_path = os.path.join(remotepath, folder)
        for x in sftp_walk(sftp, new_path):
            yield x

# Pagrindinė funkcija
def main():
    global indexed_new_entries
    indexed_new_entries, current_files_and_dirs = check_new_files_and_dirs()
    if indexed_new_entries:
        download_path = '/home/vaidotak/Atsiuntimai/'
        selection = input("Įveskite failų numerius, kuriuos norite atsisiųsti (pvz.: 1, 4, 18) arba 'visus' atsisiųsti visus: ")
        if selection.lower() == 'visus':
            files_to_download = [entry[0] for entry in indexed_new_entries]
        else:
            files_to_download = list(map(int, selection.split(',')))
        
        # Prisijungimas prie SFTP serverio atsisiuntimui
        transport = paramiko.Transport((hostname, port))
        transport.connect(username=username, password=password)
        sftp = paramiko.SFTPClient.from_transport(transport)
        
        download_files(sftp, files_to_download, download_path)
        
        sftp.close()
        transport.close()

if __name__ == "__main__":
    main()
config.json
 
{
    "hostname": "**.***.***.**",
    "port": 22,
    "username": "*******",
    "password": "******",
    "remote_path": "/***********/********"
  }
  
and Bash script
 
#!/bin/bash

# Aktyvuojame virtualią aplinką
source /home/vaidotak/python/.venv/bin/activate

# Paleidžiame Python skriptą
/home/vaidotak/python/.venv/bin/python /home/vaidotak/python/ftp/ftp.py

Conclusion

Using Python, we have created a more robust and flexible solution for monitoring an SFTP server for new files and directories. The added functionality of performing Google searches for specific file names enhances the utility of the script, making it not only a monitoring tool but also a powerful information retriever.

Komentarai

Populiarūs šio tinklaraščio įrašai

Configuring a NixOS firewall for everyday use

Setting up syncthing as a service on openSUSE and other Linux distributions

Automatinis didelio kiekio failų siuntimo per Telegram skriptas