top of page
Search
  • Writer's pictureJames Goodall

Automating Office 365 Email Attachment Downloads with Python

Updated: Dec 2, 2020

Recently I had a scenario put to me where a colleague wanted to automate some aspects of their data collection. Their process at the time was that they would receive an automated email from a system that would contain a password protected .zip attachment (secure right?). Within this there was a .csv document and so they would manually download this file and start completing some analysis on it.


I felt that the best solution to their problem would involve python & after a bit of research I settled on the exchangelib module which seems like it can do just what I needed (as we use Office 365).



I arranged for the auto-generated email to be sent to a specific mailbox that we in our team use (for the purpose of this post I’ll call it ‘team@dataandstuff.co.uk’) at 05:00 each morning, and put a quick rule in place that would automatically move these emails to a sub-folder within the Inbox (let’s call this folder ‘Beans’ for now). Within the ‘Beans’ sub-folder I then created another folder called ‘archive’ (where we would move the processed email to once we had done what we needed).

So on to writing the script – well we obviously needed the exchangelib module, but we didn’t need everything from that so I started with


from exchangelib import Credentials, Account, FileAttachment

(note: you will need to install this module manually first – if you are using pip then this is ‘pip install exchangelib’)


Now – remember that the .zip file was password protected so we needed a way to deal with that and that would be through the module zipfile, so your script now looks like


from exchangelib import Credentials, Account, FileAttachment
import zipfile

(as before install the zipfile module first)


We will also need to import the os module so that we can specify where to save the files to later on


from exchangelib import Credentials, Account, FileAttachment
import zipfile
import os
 

We’ll start off by specifying some variables for our username and password that we use to sign into our office365 account as well as the password to unlock the .zip file - for the purpose of this blog let’s save the credentials as variables (you’d probably want to store these more securely normally):


username='dataandstuffjames@gmail.com'
pwd='hello_world'
zippwd='hello_world'

Next, we’ll need to assign these to a ‘credentials’ variable


credentials = Credentials(username,pwd)

And following that let’s set our account variable (i.e. the mailbox we will be connecting to):


account = Account(team@dataandstuff.co.uk', credentials=credentials, autodiscover=True)

Now we’ll need specify where we want to download the .zip file to, and where we want to save the unzipped contents – again through variables:


local_path=r'D:\\Email\\downloads\\'
unzip_path=r'D:\\Email\\downloads\\unzipped\\'

(Note: Choose your own directories for this)


Next up, let specify what folder in the mailbox we want to be reading (remember I created the sub-folder ‘Beans’ earlier?). You can view all available folders by printing out the account root tree as follows:


print(account.root.tree())

If you scroll down to find the ‘Top of Information Store’ section you can then view the path structure beneath that. So for our structure this will be:

folder = account.root/'Top of Information Store'/'Inbox'/'Beans'

We also want to say where we will be moving the email to after we’ve done our processing (remember that archive sub-folder we created?):

to_folder = folder/'archive'

OK now we can put all that into play in a few very short lines of code:

for item in folder.all().order_by('datetime_received'):
	for attachment in item.attachments:
		if '.zip' in str(attachment.name):
			if isinstance(attachment, FileAttachment):
				download_path = os.path.join(local_path, attachment.name)
 				with open(download_path, 'wb') as f:
                   			f.write(attachment.content)
				with zipfile.ZipFile(download_path) as zip_ref:
					zip_ref.extractall(unzip_path,pwd=zippwd.encode())
	item.move(to_folder)

What this is basically doing is looking at all emails in the ‘Beans’ folder (ordered by datetime_received (ascending)), then one by one, looking at all attachments within those emails – if the attachment is indeed a file attachment (and a .zip file type), then download the .zip file and unzip the contents to the ‘unzip_path’ location we specified earlier (passing the zippwd variable as an argument). Finally it moves the email to the ‘archive’ sub-folder.


You can then complete whatever processing you need to on the files contents (e.g. load them into a pandas dataframe).


Our finished script now looks like this:


from exchangelib import Credentials, Account, FileAttachment
import zipfile
import os
 
username='dataandstuffjames@gmail.com'
pwd='hello_world'
zippwd='hello_world'
credentials = Credentials(username,pwd)
account = Account(team@dataandstuff.co.uk', credentials=credentials, autodiscover=True)
local_path=r'D:\\Email\\downloads\\'
unzip_path=r'D:\\Email\\downloads\\unzipped\\'
folder = account.root/'Top of Information Store'/'Inbox'/'Beans'
to_folder = folder/'archive'
 
for item in folder.all().order_by('datetime_received'):
	for attachment in item.attachments:
		if '.zip' in str(attachment.name):
			if isinstance(attachment, FileAttachment):
				download_path = os.path.join(local_path, attachment.name)
 				with open(download_path, 'wb') as f:
                   			f.write(attachment.content)
				with zipfile.ZipFile(download_path) as zip_ref:
					zip_ref.extractall(unzip_path,pwd=zippwd.encode())
	item.move(to_folder)

This is obviously a basic run through and you will want to build some more conditions into your programme and make it more secure, but that should be the bare bones to help you automate the download of email attachments and unzip the .zip files (if needed). You could for instance set this script up to run via Windows Task Scheduler on a specific time (our use case had the email being received just after 05:00, so this could be 05:15), then set the data to load into a SQL Server database and you never have to even look at the emails again.

The exchangelib module has wayyy more functions than this and can be very useful to help you automate a lot of the boring admin that you complete resulting from your emails and you can of course find out more from the documentation available on pypi:



I hope this was useful! I know it certainly will be for me!

3,612 views1 comment

Recent Posts

See All
bottom of page