Monitoring SFTP server updates with Python
Introduction
Managing files on an SFTP server can be a daunting task, especially when you need to monitor updates regularly. While Bash scripting is a powerful tool, it has its limitations, particularly when dealing with more complex tasks like checking for new files and directories on an SFTP server. In this blog post, we'll explore how Python, with its extensive libraries and functionalities, can be leveraged to perform this task efficiently. We'll also walk through the code that connects to an SFTP server, checks for updates, and even performs a Google search for specific file names.
Vien tik su Bash nepavyko tikrinti SFTP serverio failų ir katalogų atnaujinimų, todėl buvo pasitelktas Python. Python turi platesnes galimybes, todėl pridėtos tokios funkcijos kaip prisijungimas prie SFTP serverio, naujų failų tikrinimas ir knygų pavadinimų paieška Google. Python kodas naudojamas prisijungti prie serverio, tikrinti naujus failus ir atlikti Google paieškas, taip suteikiant daugiau funkcionalumo ir patikimumo lyginant su Bash skriptu.
Why Bash Alone Wasn't Enough
Initially, the goal was to create a Bash script to monitor the SFTP server for any new files or directories. However, Bash scripts have certain limitations, particularly in handling complex file operations and network interactions. These limitations made it difficult to efficiently check for updates and perform additional actions like searching for book titles on the internet.
Leveraging Python for Enhanced Functionality
Python offers a wide range of libraries and modules that simplify the task of interacting with SFTP servers, handling files, and making HTTP requests. By using Python, we can take advantage of:
- Paramiko: A robust library for SSH2 protocol, allowing us to connect to and interact with SFTP servers easily.
- Requests: A simple HTTP library for making requests to web services, such as Google search.
- Regex: For parsing and manipulating strings efficiently.
The Code Explained
Let's dive into the Python code that makes all this possible.
Configuration File: We store server credentials and paths in a JSON configuration file for better security and manageability.
Connecting to the SFTP Server: Using the
paramiko
library, we establish a secure connection to the SFTP server and list the directory contents.Checking for New Files: We compare the current directory contents with a previously saved state to identify any new files or directories.
Performing Google Search: For any new entries, we strip unnecessary details and perform a Google search to find more information about the book titles.
Avoiding Repetitions: We ensure that no duplicate entries are processed by using a set to keep track of processed titles.
Here is the complete Python code:
import paramiko
import os
import json
import re
import requests
from urllib.parse import quote_plus
import time
import stat
from tqdm import tqdm
# Nuskaityti konfigūracijos failą
with open('config.json', 'r') as f:
config = json.load(f)
hostname = config['hostname']
port = config['port']
username = config['username']
password = config['password']
remote_path = config['remote_path']
# Laikino katalogo vieta lokaliame kompiuteryje
local_temp_path = '/home/vaidotak/python/ftp_project/temp/'
download_path = '/home/vaidotak/Atsiuntimai/'
# Google paieškos funkcija
def search_google(title):
try:
search_url = f"https://www.google.com/search?q={quote_plus(title)}"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
response = requests.get(search_url, headers=headers)
if response.status_code == 200:
return search_url
else:
print(f"Klaida siunčiant užklausą Google: {response.status_code}")
except Exception as e:
print(f"Klaida paieškos metu: {e}")
return None
# Funkcija prisijungimui prie SFTP serverio ir katalogo tikrinimui
def check_new_files_and_dirs():
try:
# Prisijungimas prie SFTP serverio
transport = paramiko.Transport((hostname, port))
transport.connect(username=username, password=password)
sftp = paramiko.SFTPClient.from_transport(transport)
# Gaunam nuotolinio katalogo sąrašą
current_files_and_dirs = sftp.listdir_attr(remote_path)
# Tikriname, ar yra ankstesnių duomenų
previous_check_file = os.path.join(local_temp_path, 'previous_check.txt')
if os.path.exists(previous_check_file):
with open(previous_check_file, 'r') as f:
previous_data = f.readlines()
previous_files_and_dirs = [line.strip() for line in previous_data]
else:
previous_files_and_dirs = []
# Dabartinio sąrašo suformavimas
current_files_and_dirs_names = [entry.filename for entry in current_files_and_dirs]
# Nauji failai ir katalogai
new_entries = set(current_files_and_dirs_names) - set(previous_files_and_dirs)
processed_titles = set()
indexed_new_entries = []
if new_entries:
print("Yra naujų failų ar katalogų:")
for index, entry in enumerate(new_entries, start=1):
# Ištraukiame knygos pavadinimą iki skliausto ir pašaliname metus bei "Audiobook" žymą
match = re.match(r"^(.*?)(\s*\(.*\))?(\s*Audiobook)?$", entry)
if match:
book_title = match.group(1).strip()
if book_title not in processed_titles:
processed_titles.add(book_title)
google_link = search_google(book_title)
file_mtime = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(current_files_and_dirs[current_files_and_dirs_names.index(entry)].st_mtime))
print(f"{index}. \033[92m{entry}\033[0m ({file_mtime})")
if google_link:
print(google_link)
else:
print(f"Nerasta nuoroda Google sistemoje: {book_title}")
indexed_new_entries.append((index, entry))
# Pridedame laukimo laiką tarp užklausų
time.sleep(1)
else:
print("Naujų failų ar katalogų nėra.")
# Išsaugome dabartinius duomenis būsimam tikrinimui
with open(previous_check_file, 'w') as f:
for entry in current_files_and_dirs_names:
f.write(entry + '\n')
# Atsijungimas
sftp.close()
transport.close()
return indexed_new_entries, current_files_and_dirs
except Exception as e:
print(f"Klaida: {e}")
return [], []
# Atsisiuntimo funkcija su progreso juosta
def download_files(sftp, files_to_download, download_path):
for file_index in files_to_download:
# Gauti failo pavadinimą pagal numerį
file_entry = indexed_new_entries[file_index - 1][1]
remote_file_path = os.path.join(remote_path, file_entry)
local_file_path = os.path.join(download_path, file_entry)
try:
# Patikriname, ar tai yra katalogas
file_info = sftp.lstat(remote_file_path)
if stat.S_ISDIR(file_info.st_mode):
# Sukuriame vietinį katalogą, jei jo nėra
if not os.path.exists(local_file_path):
os.makedirs(local_file_path)
# Atsisiunčiame visus katalogo failus rekursyviai
for dirpath, dirnames, filenames in sftp_walk(sftp, remote_file_path):
# Sukuriame atitinkamus vietinius katalogus
local_dirpath = os.path.join(download_path, os.path.relpath(dirpath, remote_path))
if not os.path.exists(local_dirpath):
os.makedirs(local_dirpath)
for filename in filenames:
remote_file = os.path.join(dirpath, filename)
local_file = os.path.join(local_dirpath, filename)
file_size = sftp.stat(remote_file).st_size
with open(local_file, 'wb') as f, sftp.file(remote_file, 'rb') as remote_file_handle:
with tqdm(total=file_size, unit='B', unit_scale=True, desc=filename) as pbar:
while True:
data = remote_file_handle.read(1024)
if not data:
break
f.write(data)
pbar.update(len(data))
else:
file_size = file_info.st_size
with open(local_file_path, 'wb') as f, sftp.file(remote_file_path, 'rb') as remote_file_handle:
with tqdm(total=file_size, unit='B', unit_scale=True, desc=file_entry) as pbar:
while True:
data = remote_file_handle.read(1024)
if not data:
break
f.write(data)
pbar.update(len(data))
print(f"Failas {file_entry} sėkmingai atsisiųstas.")
except Exception as e:
print(f"Klaida atsisiunčiant failą {file_entry}: {e}")
# Papildoma funkcija rekursyviam katalogų tikrinimui ir atsisiuntimui
def sftp_walk(sftp, remotepath):
path = remotepath
files = []
folders = []
for f in sftp.listdir_attr(remotepath):
if stat.S_ISDIR(f.st_mode):
folders.append(f.filename)
else:
files.append(f.filename)
yield path, folders, files
for folder in folders:
new_path = os.path.join(remotepath, folder)
for x in sftp_walk(sftp, new_path):
yield x
# Pagrindinė funkcija
def main():
global indexed_new_entries
indexed_new_entries, current_files_and_dirs = check_new_files_and_dirs()
if indexed_new_entries:
download_path = '/home/vaidotak/Atsiuntimai/'
selection = input("Įveskite failų numerius, kuriuos norite atsisiųsti (pvz.: 1, 4, 18) arba 'visus' atsisiųsti visus: ")
if selection.lower() == 'visus':
files_to_download = [entry[0] for entry in indexed_new_entries]
else:
files_to_download = list(map(int, selection.split(',')))
# Prisijungimas prie SFTP serverio atsisiuntimui
transport = paramiko.Transport((hostname, port))
transport.connect(username=username, password=password)
sftp = paramiko.SFTPClient.from_transport(transport)
download_files(sftp, files_to_download, download_path)
sftp.close()
transport.close()
if __name__ == "__main__":
main()
{
"hostname": "**.***.***.**",
"port": 22,
"username": "*******",
"password": "******",
"remote_path": "/***********/********"
}
#!/bin/bash
# Aktyvuojame virtualią aplinką
source /home/vaidotak/python/.venv/bin/activate
# Paleidžiame Python skriptą
/home/vaidotak/python/.venv/bin/python /home/vaidotak/python/ftp/ftp.py
Conclusion
Using Python, we have created a more robust and flexible solution for monitoring an SFTP server for new files and directories. The added functionality of performing Google searches for specific file names enhances the utility of the script, making it not only a monitoring tool but also a powerful information retriever.
Komentarai
Rašyti komentarą