How To Watch File for Changes with Python (and send notification)

Watch file for changes with Python

Hey there! Some links on this page may be affiliate links which means that, if you choose to make a purchase, I may earn a small commission at no extra cost to you. I greatly appreciate your support!

In this tutorial, you’ll learn how to detect when a file has changed with a Python program that automatically executes on an interval via a cronjob. Although this set of instructions specifically shows you how to watch PDFs for changes, the concept of comparing file hashes (specifically a SHA1 hash function) holds true for most other files, both local and at a remote URL.

There is also a notification aspect of the Python program which notifies my phone via a push notification. I didn’t want to mess around with setting up an email server; believe me, it’s a pain in the ass (spam filters nowadays are tough).

Anyway, follow along and maybe you’ll learn a thing or two about how you could possibly write a similar Python program to automate some aspect of your life.

Background Story

Note: Feel free to skip this section if you just want the meat of the tutorial.

Let me set the stage for you: I’m invested in this fund at Charles Schwab https://content.schwabplan.com/funddetail/CBFXP.pdf. Its performance is updated approximately every month. The problem is that I don’t know when the PDF receives its updates. Sometimes it’s in the beginning of the month. Sometimes in the middle of the month. And at least once I saw no update for the entire month.

Rather than manually checking the PDF every day or every week for changes, I decided to write a Python program to automatically check for changes to the file every day and send me a notification if it found updates.

Call me lazy or a nerd, but I like to automate certain things to make my life easier. Here’s how I did it.

Python File Change Detector + Notifier

There are roughly three main aspects to this program which detects and notifies when a file has changed:

  1. Download the file from a URL
  2. Hash the downloaded file and compare it to the previous one
  3. Notify me of any changes to the file

Additionally, there’s technically a fourth aspect which is the scheduler (i.e. a cronjbo) that executes the program on an interval. We’ll talk about that last. But first, let me walk you through the software.

Note: All code in this tutorial is available on GitHub.

1. Downloader

The first aspect of this program is the downloader. This module downloads the PDF from a URL with a date-time stamped filename and compares its hash to the hash of the latest PDF called latest.pdf.

If the hashes are different, this means the files are different and an updated PDF was found. In this case, the newly downloaded file is archived and renamed to lastest.pdf. Additionally, a notification is sent to my phone.

Otherwise if the hashes are the same, we simply delete the newly downloaded file because no change has been detected.

At first run of the program, latest.pdf doesn’t exist, so there’s some additional logic in the try/except block that handles this case.

import os
import shutil
import urllib.request
from datetime import datetime
from hasher import sha1
from notifier import send_notification

def move_files(filename):
    shutil.copy(filename, '/root/campbell/updates/')
    os.rename(filename, '/root/campbell/latest.pdf')

def check_for_update():
    # get date and time as string
    now = datetime.now()
    filename = '/root/campbell/{}.pdf'.format(now.strftime("%Y%d%m%H%M%S"))
    
    # download file
    url = 'https://tonyteaches.tech/test.pdf'
    urllib.request.urlretrieve(url, filename)
    
    # get hashes
    try:
        hash_latest = sha1('/root/campbell/latest.pdf')
    except:
        move_files(filename)
        print('First file saved')
        return
    hash_new = sha1(filename)
    
    # compare hashes
    if hash_latest != hash_new:
        print('Found update')
        move_files(filename)
        send_notification(url)
    else:
        print('No update')
        os.remove(filename)

check_for_update()

2. Hasher

The next file is simply a generic SHA1 hashing function that takes in the path to a file and returns the value of its hash.

import hashlib

def sha1(filename):
    BUF_SIZE = 65536  # read stuff in 64kb chunks!
    sha1 = hashlib.sha1()
    with open(filename, 'rb') as f:
        while True:
            data = f.read(BUF_SIZE)
            if not data:
                break
            sha1.update(data)
    return sha1.hexdigest()

3. Notifier

The last piece of this Python program is the notification logic. Like I alluded to earlier, email notifications would be ideal in this case, but setting up an email server is a whole program in itself, so I went with the simplest solution I could find.

Enter Pushsafer, a German push notification service that gives you 50 free calls to their API. Additional API calls can be bought for pretty cheap.

Anyway, I tapped into their API with the following code that sends the app on my Android phone a notification any time a change in the PDF files is detected.

Since I should only receive up to 12 notifications per year (assuming the PDF file only changes one time per month), this free service should last me over four years.

from urllib.parse import urlencode
from urllib.request import Request, urlopen

def send_notification(url):
    post_fields = {
        'k':'xxxxxxxxxxxxxxxxxxxx',
        't':'New PDF found!',
        'd':'a',
        'm':'A new PDF was found.',
        'u':url
    }

    ps = 'https://www.pushsafer.com/api'
    request = Request(ps, urlencode(post_fields).encode())
    json = urlopen(request).read().decode()

4. Scheduler

The (technically) final piece to the setup is the cronjob that executes the Python program on an interval. I have it scheduled to run every morning at 6 AM server-time.

0 6 * * * python3 /root/campbell/downloader.py

Next Steps

I’ll be the first to admit that the above code isn’t optimized, can use lots of error handling, and makes me cringe when I see the hardcoded values. I provided it here for basic demonstration purposes.

That being said, if you’re more of a visual learner, here’s a video demo I put together that walks you through this program in action.

YouTube video

Facebook
Twitter
Pinterest
LinkedIn
Reddit

Meet Tony

Tony from Tony Teaches Tech headshot

With a strong software engineering background, Tony is determined to demystify the web. Discover why Tony quit his job to pursue this mission. You can join the Tony Teaches Tech community here.

2 Responses

  1. Thanks for the tutorial Tony!

    Is there also a way of the program to tell what is changed in the pdf file?

Leave a Reply

Your email address will not be published. Required fields are marked *