Debugging
By definition if a program does something you don't expect it to it's because at some point the state of the program is something you didn't exepct it to be. It's easy to reason that if at every point your program is in a correct state then the answer it produces must also be correct.
Thus debuggers are usually geared at giving you information of the program and it's state as it executes. If you can find where the state goes ary you can fix it. Before we get to debuggers though it's important to structure your code in a way that isolates code from each other. If you have 1000 lines of code in one function and the function returns the wrong result, then the bug can be anywhere in the 1000 lines. BUT if you have 1000 lines of code split up into 50 functions you can find which function is the one that's returning the wrong value and know that in the 10 or so lines of that function, the bug exists. This reduces your search spaceBy a lot.
There are countless reasons why splitting code up into slef contained functions are good, this is just one of them
Consider the examples below which contains a bug, observe how it's easier to narrow down where the bug might be once you have a extra function.
# Produces the intersection of two lists
def intersection(a,b):
i = 0
j = 0
c = []
while i < len(a):
while j < len(b):
if a[i] == b[j]:
c.append(a[i])
j += 1
i += 1
return c
# intersection([1,2,3],[1,2,3]) returns [1]
def in_list(e, b, j):
result = False
while j < len(b):
if a[i] == b[j]:
result = True
break;
j += 1
print(result)
return result
def intersection(a,b):
i = 0
j = 0
c = []
while i < len(a):
if in_list(a[i], b, j):
c.append(a[i])
i += 1
return c
# intersection([1,2,3],[1,2,3]) == [1]
# in_list prints True, False, False
A debugger works by letting us pause the program and step through it slowly observing state. So for the above we can set a _breakpoint_ (basically a point at which we want the program to pause and alert us) on line 1 of the function
Once we've paused we can observe variables and we can step through the program. There are 4 main ways you traverse a progream.
- Step - step thorugh ever single line of the code line by line
- Step Over - step thorugh ever single line of the code line by line but don't decend into functions
- Step Out - run all the code until the function ends.
- Continue - run the code normally until the next break point
Regex
Regex is short for a “Regular Expression”, They are a way to describe a language, in our cases they let us susinctly describe a subset of every possible combination of ascii chracters.
let’s say you had a bunch of words and you wanted to extract every 3 letter word which ended with ‘ay’, i.e if you were given
hello
goodbye
friday
day
ray
pay
You’d get
day
ray
pay
Now if you wanted to specify this language, that is you wanted to convay to someone what text was allowed in this subset you could day “the words in my language are aay bay cay day …” but really you could say it better as “<any letter>ay”
And this is what regular expressiosn do, they let us describe a language without
specifying every possibility. For example in regular expressions the dot
character .
represents “any character in the symbol set” which for us means an
ascii character A-Za-z0-9 etc.
So the above could be expressed as .ay
any character followed by a ay
lets say you wanted to express every possible hey
where the number of e
's
are variable i.e heey
hey
heeeeeeey
, You couldn’t list every
option even if you tried since you can put infinity e’s in between the h and the y.
So intead you say he+y
the +
Symbolises that there can be 1 or more
e
between the h and the y. What if you wanted 0 e’s to be a option so
hy
is also in your langage? just use *
which means 0 or more
he*y
Now the power comes in that via some nifty algorithms from formal languages and linguistics you can convert any regular expression into a program which will tell you if a input string matches the pattern or not. Almost every programming language has this feature now days
import re
if re.match('he*y',"heeeeeey"):
print("hello!")
will print out hello because “heeeeeey” is in the language specified by “he*y”
you can match any sentence with starts with
“please” via ^please.*
the ^
means the start of input,
nothing should come before the first p. You can do a similar thing with
$
to represent the end of input.
There is also the concept of escaping, given the above how would you specify the string
f*ck
, if you just used that you'd match ffffffffck
,
you can escape a special character basically telling regex to treat it like any
other character as such f\*uck
Now there is also the ability to match a subset of the symbol set. You just use square brackets
[hH]ello
will match Hello
and hello
the first token represents
one of either H
or h
. If you only wanted to match characters and not spaces you could do
[abcdefghijklmnopqrstuvwxyzABCDEF..]
but these let you use ranges so you can do
[a-zA-Z0-9]
, even better you can use a built in short cut most langauges give you in the form of
\w
, you can also use \s
to match any type of white space, spaces, tabs etc. and you can use
\d
to match [0-9]
as in any type of digit.
With specifying a symbol subset you can also ask it to match anything in the full alphabet EXCEPT
something, the regex [^Aa]+
will match any word without the letter a in it, i.e hello
will match
but apple
and brah
won't
There are more details but most of them are more fun to figure out live. Have a crack --> https://alf.nu/RegexGolf
Groups
Now it's useful to be able to specify a language but a lot more useful is using that to extract information.
It's great if you can use a regex like .+@.+\..+
to detect if a file has a email
in it but it would be cool if you could also get the email itself.
This is also a common problem and thus there exists a system for it, it's calle dregex groups, and it basically allows you to run strings through your regex but on a match additionally extract out fields. Observe below.
import re
# r just means raw, it tells python not to do anything fancy with this string
# like convert \\ into \ which it might do otherwise.
# search is just like match but is less strict in where it finds the
# pattern (doesn't have to be the start of the line for example) look for multiple instances.
m = re.search(r"(\w+)@(\w+)","hello world my email is john@smith")
print(m.groups())
# prints out (john, smith)
Finally there is a important thing to understand about how regex matches, and that is that it
matches greedily. The regex "hello (.*)" matching against "hello world how are you" will give you
"world how are you"
as your first group. Simply put the star and plus operations
will match as much as they can without violating the regex. If you would like it to match shyly you can do
hello (.*?)
but of course this will match nothing as the least * can match is 0 characters.