File IO

Obviously the power of a programming language comes from it’s ability to interact with data outside it’s session, i.e files and websites. Both are more or less the same sort of interaction mechanisms, within python the hard work of doing lower level interactions are handled for you and you just get a set of data you can interact with.

In the backend python does a lot of what you’d expect, it opens connections and handlers and reads specific block sizes and maintains your position in a file but it exposes this as just very simple list like constructs.

Lets start of with basic file io, we can open a file and get a file object. A object here is just a set of data and functions bundled together, we’ll talk more about it in a later sesh.

f = open("text.txt")

Now we have the file opened and ready to play with but it’s important to note, in the operating system you can have a file in many states. You can open a file as read only and in this way you can’t accidently write over it. You can open a file and have the data returned as text, preprocessed from binary, or you can read the data as raw binary. All of these options are contained in something called a mode string. by default open will use a mode string of "r" which means open a file for reading as text. If we wanted to open a file for reading and writing we would use "r+" i.e

f = open("text.txt","r+")

But you have to be careful, opening a file for writing will automatically clear the file before you start, to keep a file as is and just write to the end of the file you would use the append mode i.e “a”. You can also specify that you want this file to be interacted with in binary instead of text with “rb” or “wb”. You can read about open at the python doc page

Now that we have our file we can read or write to it with f.read() or f.write()

f = open("test.txt","w")
f.write("I am so smart")
f.write("i really am smart")

Now if you open up the file you’ll see someting weird

test.txt

I am so smartireallymasmart

When writing to files you need to be very explit, when you want a new line you have to state it

f = open("test.txt","w")
f.write("I am so smart\n")
f.write("i really am smart")

just like all other ascii characters we talked about \n is one of them. It’s just like A or b but it has a special meaning in that it tells your computer that when displaying this text, put a line break here.
But since it’s just like any other character you can use it in strings and split by it etc.

Ok now one thing i’ve glossed over, you need to close your files

f = open("test.txt","w")
f.write("I am so smart\n")
f.write("i really am smart")
f.close()

If you don’t close your files you won’t suddenly get cancer and die but a program that doesn’t close files but keeps opening them will run into issues with speed and memory. In addition other programs may think twice about accessing the file if you have a open connection to it.
Now it’s very easy to forget to close the file, so files use a build in feature of python called a with statement to build in automatic closing.

with open("test.txt","w") as f:
    f.write("hi!\n")

here the f is opened but only lives for the scope of the indendeted block. Afterwards it automatically gets closed. For this reason you should always use a with statement, it’s there to make your life easier so let it.

Reading data

now when reading data with f.read() you can give it a count of characters (or bytes depending on mode) to read in, i,e

f.read(10) // read 10 characters

You can also not specify a length, what this means is that if you try and put this into a variable python will read the entire file for you.

text = f.read() // read the whole file into the variable text

Now it’s usually not useful to process a file character by character, we usually work in lines, so you can use

with open("test","w") as f:
    f.write("hello\nworld")
with open("test","r") as f:
    l = f.readline() # read 1 line from the file
    l2 = f.readlines() # reads all lines for a file, returns a list of lines
    print(l2) # what would this print?

It’s also important to remember that readlines doesn’t strip the new line, f.readline above would return hello\n so if you printed it you’d get 2 new lines. This is easy fix, for any string where you wanna cut off any trailing space/new lines we can do line.strip()

User Interaction

It’s common to ask a user of your script for some input and python makes that super easy

i = input("hello tell me your name: ") # i will be the response from the user

If the user presses ctrl-c or similar input will raise a Exception which ends your program.

Sys.argv

Now when you run a python file you so do as

python script.py

You can actually pass things into the script from the command line called command line arguments

python script.py arg1 arg2 arg3

If you are using pyCharm or something else it’ll do this for you, look around there is a preferences panel where you can specify command line arguments

you can access these via

import sys
sys.argv # list of arguments
#Note that argv includes the script name
print(sys.argv) # prints ["script.py", "arg1", "arg2", "arg3"]

The internet

Now in much the same way as files we can read a webpage and search around it, you can do this by hand with some good text searching and HTTP but there are packages to make this way easier. Lets start of with Requests.

This is a library to make it super easy to interact with web servers. Networking and internet protocols are a bit out of the scope of a intro to python lesson so just to explain the absolute basics, the way the internet works is you send a request to a server who will read this request and then complete it or reject it. A server will reject a request if you did something wrong, i.e you arn’t logged in or the request you made didn’t make sense. It can also reject it if something went wrong, the server is overloaded or buggy for example.

+ ------- +    ====[pls give me google.com]===>    + -------------- +
|  chrome |                                        | google server  |
+ ------- +   <======[here is google.com]=======   + -------------- +

Other then requesting resources you can also give some data to the server, for example if you are making a instagram post, or you can ask the server to delete some data, such as deleting a instagram post etc. etc. The one you will almost always be working with is a GET request where you ask a server for something, in most cases this will be a website.
If you decide to start using API's you will have to use a larger variety of requests but it’s not too hard to pick up so we are going to focus on GET

import requests                               # you may have to install this
r = requests.get("https://www.google.com")    # send a request to the google server asking for their home page
print(r)                                      # r will have the response and has a bunch of information, a lot of it we don't care, we just care about 2 things
print(r.status_code)                          # status code will tell us if the server accepted our request and sent a response, 200 = all good, 404 = unknown page etc.
print(r.text)                                 # r.text is the response as text, this is whatever the server gave us, ususally a HTML page.

Now if you are doing a lot of these requests this will be very slow, and you should have a chat with someone about async programming, multi threaded IO, web sessions and connection pooling. But for now we’re gonna stick with this basic structure.

Once you have the page you can either search through it for what’s important or convert it into a tree like structure that you can walk. The latter is a lot more scaleable.

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.google.com")
soup = BeautifulSoup(r.text, 'html.parser')

# soup builds a tree and indexes all the stuff on the page so we can query it.
print(soup.title.name) # prints out the title of the page, in this case 'Google'

You can check out more info here:

Dynamic Functions

Ok so any function you’ve written can take in a specific number of functions

def f(a,b,c):
    print(a,b,c)

but obviously sometimes that doesn’t work, for example, print needs to be able to take any number of elements

print(a,b,c,d,e,f)

The way it does this is by having a catch all

def f(*args):
    print(args)

The star means that let a user enter in any number of arguments and just package them into a list called args that i can access. You can put this at the end of your normal "positional" arguments as well

def f(a,b,*args):
    print(a,b,args)

Here a and b must be given, they are positional paramaters but after that you can have 0 or more args.

f(1,2) # good
f(1,2,3,4,5,6,7) # good
f(1) # bad! need a AND b

You can do this again but with keyword arguments. but first lets quickly go over how you can give things default values

def f(a,b=2):
    print(a,b)

what this means is that you can have a and b but if someone doesn’t give you b, assume its’ 2

f(1,2)   # good
f(1)     # good
f(1,2,3) # bad, will fail
f()      # will fail

Now a different concept to this is keyword params, obviously it gets difficult to remember where what goes

open("rb","test.txt")  # this will fail because it's (filename, mode) not (mode, filename)

Remebering the calling structure for every function is a bit annoying so in python you can just tell the function exactly what you mean

def f(a,b):
    print(a,b)

f(b=2,a=1) # prints 1,2

of course what if we wanted users to be able to enter in any keywords they wanted? we use **kargs kargs is a standard, it stands for keyword arguments but you can all your variable anything **a works just as well

def f(a,b,**kargs):
    print(kargs)

Here a and b are mandatory and then you can give any amount of key word arguments that get passed into the function as a dict

f(1,2,hello="world",big="money") # prints out {"hello":"world","big":"money"} or similar