PDA

View Full Version : MySQL Query String Containing Slashes


Sparhawk
10-15-07, 11:26 PM
THE BACKGROUND

I have a MySQL database containing the following columns:

X: the unique index (a number)
name: filename of photograph
mimetype: mime type of picture file (like image/jpeg, etc)
dateuploaded: date the picture was uploaded to the server
datetaken: date the user reports the picture was taken
description: description of picture contents, possibly including multiple lines separated by \n
folders: list of "\n"-separated "folders" -- like folder1\subfolder1\n
folder2\subfolder2\n
folder1\subfolder1\subfolder1b\subfolder1c\n
folder1\subfolder1\subfolder1d\subfolder1e\n
filelocation: location of the picture file on the server
thumbnail: a 'blob' containing a 150px-wide thumbnail of the picture
viewcount: number of times the picture has been viewed


I am trying to construct a MySQL query that returns any values of X when the folders field contains a given folder name, or subfolder of that folder.

My code (assume that I have already opened the access to the database outside of this function):

sub grabPicturesInFolderAndSubFolders()
{
my $folder = $_[0];
my @returnPics;

my $sth = $dbh->prepare("SELECT * FROM pictures WHERE X <> '0' AND folders REGEXP '^$folder' OR folders REGEXP '\n$folder' ORDER BY X;") or die "$!";
$sth->execute;
while ($thisLine = ($sth->fetchrow_array())[0]) {
push @returnPics, $thisLine;
}
$sth->finish;
return @returnPics;
}


This function takes the input of a desired folder, and SHOULD return an array of values of X that contain that folder in it's list of folders.

A breakdown of the query:


SELECT * FROM pictures: pictures is the name of the table
WHERE X <> '0': the picture at X = 0 should always be ignored
AND
folders REGEXP '^$folder': the first folder listed contains the desired folder
OR
folders REGEXP '\n$folder': any other folder listed contains the desired folder
ORDER BY X;: sorted in ascending order by X



The reason for the two "folders REGEXP" expressions is that the first folder isn't preceded by a \n.

THE PROBLEM
If the desired folder (stored in the $folder variable) contains any backslashes, the query returns no values.

For example, if I run the query for the value $folder = "folder1" using the values in the first quote above, I get the following array:
folder1\subfolder1
folder1\subfolder1\subfolder1b\subfolder1c
folder1\subfolder1\subfolder1d\subfolder1e


And, if I run the query for the value $folder = "folder1\subfolder1", it should return the following array:
folder1\subfolder1\subfolder1b\subfolder1c
folder1\subfolder1\subfolder1d\subfolder1e


But it actually returns:



So, any time my $folder variable contains a slash, the function always returns no values.
This is obviously not the desired result.

Just in case I needed to 'escape' out the slash character, I tried adding the following line before the query:

$folder =~ s/\\/\\\\/g;

This would alter my $folder value from "folder1\subfolder1" to "folder1\\subfolder1" thereby escaping out the backslash. The query returned the same result.

How do I use a variable whose value contains a backslash as a test variable in the regexp?

Neat Pete
10-16-07, 08:30 PM
Nobody else has replied in 24 hours, so here goes...

(a) I strongly suggest you rename the field named X to PictureID immediately. Please.

(b) Quote: "I have a MySQL database containing the following columns" - Nope, it's a table.

(c) Quote: "X: the unique index (a number) " - Nope, it's field. Indexes are quite separate. If it's unique, then it's a candidate to be the primary key. If it's some sort of system generated counter, unique by its definition, then you should definitely use it as the primary key. It's correct to put it first as you have done.

(d) The hard part is your problem about the slashes. I don't actually know the answer. The real underlying problem is that you have several things stored in the folders field and you are using clever programming to sort it out. Professional database designers would never do it this way.

(e) I'm not sure how you are organising your images. Are you saying you have several images all with the same name, but in different folders and you want to display all of them. Or are you saying you have only one image, but it could be in any one of several folders. Or something else. Defining the fields as you have done, line by line the hard way, is correct, but more detail is needed about the folders field.

(f) It's vital to hear the alarm bells ringing loudly. The moment you have several values (for the folder in this case) all stored in the one field and separated by delimiters (line breaks in this case) then it's not a relational database especially if you retrieve records based on this field. Crafty object-oriented programming might claim otherwise (or programmers who learnt Cobol in the 1960s), but don't believe it.

Neat Pete
10-16-07, 08:44 PM
Precedence of Operators.

Here is a quote from an old SQL text book dated July 1987. "When AND and OR appear in the same WHERE clause, all the ANDs are performed first; then all the ORs are performed. We say that AND has a higher precedence than OR."

You might need some round brackets.

In M$ Access and M$ SQL Server, the counter always starts from 1 and increments. The primary key when a counter is never zero or missing, so it's quite new to me that your X could be zero, NULL or an empty string.

Sparhawk
10-16-07, 08:57 PM
Nobody else has replied in 24 hours, so here goes...

(a) I strongly suggest you rename the field named X to PictureID immediately. Please.

(b) Quote: "I have a MySQL database containing the following columns" - Nope, it's a table.

(c) Quote: "X: the unique index (a number) " - Nope, it's field. Indexes are quite separate. If it's unique, then it's a candidate to be the primary key. If it's some sort of system generated counter, unique by its definition, then you should definitely use it as the primary key. It's correct to put it first as you have done.

Pardon my misuse of the terminology.


(d) The hard part is your problem about the slashes. I don't actually know the answer. The real underlying problem is that you have several things stored in the folders field and you are using clever programming to sort it out. Professional database designers would never do it this way.


I figured out the error in my code.
Changing the 'prepare' and 'execute' lines to the following:
$folder =~ s/\\/\\\\/g; # escapes the backslash character
my $sth = $dbh->prepare("SELECT * FROM pictures WHERE X <> '0' AND folders REGEXP ? OR folders REGEXP ? ORDER BY X;") or die "$!";
$sth->execute("^$folder", "\n$folder");


This produces the desired result.

(e) I'm not sure how you are organising your images. Are you saying you have several images all with the same name, but in different folders and you want to display all of them. Or are you saying you have only one image, but it could be in any one of several folders. Or something else. Defining the fields as you have done, line by line the hard way, is correct, but more detail is needed about the folders field.


The code is being used for a general purpose photo album site, into which users can upload photos. The photos are then stored in folder outside of htdocs.
Since each picture can appear in more than one "folder", they are not stored in folder order, but in folders with randomly-generated numeric names
The purpose of the 'folders' column is to store a list of delimited folders which each picture will appear in. Since there is no limit to the number of folders a picture can appear in, I didn't want to have to add database columns anytime someone exceeded the number of available folder columns. Therefore, I have the list in one field delimited by newline (\n). So, the purpose of the above code was to be able to grab all pictures that had the requested folder (or subfolder) in it's listed of folders.


(f) It's vital to hear the alarm bells ringing loudly. The moment you have several values (for the folder in this case) all stored in the one field and separated by delimiters (line breaks in this case) then it's not a relational database especially if you retrieve records based on this field. Crafty object-oriented programming might claim otherwise (or programmers who learnt Cobol in the 1960s), but don't believe it.

Sparhawk
10-16-07, 09:07 PM
Precedence of Operators.

Here is a quote from an old SQL text book dated July 1987. "When AND and OR appear in the same WHERE clause, all the ANDs are performed first; then all the ORs are performed. We say that AND has a higher precedence than OR."

You might need some round brackets.

In M$ Access and M$ SQL Server, the counter always starts from 1 and increments. The primary key when a counter is never zero or missing, so it's quite new to me that your X could be zero, NULL or an empty string.

For the AND and ORs, you raise a good point.
Here is what I want the query condition to do. Simplifying to the following pseudocode for a moment:
A AND (B OR C)

Ignoring that this is a MySQL query for a moment, lets reduce it to a basic if-then statement:

if ((A != 0) && ((B != 0) || (C != 0))) {
print qq(pass!);
} else {
print qq(fail!);
}



So for the following values of A, B, and C

A B C RESULT
0 0 0 fail!
0 0 1 fail!
0 1 0 fail!
0 1 1 fail!
1 0 0 fail!
1 0 1 pass!
1 1 0 pass!
1 1 1 pass!


So, if A = 0, fail the test regardless of the values of the other 2.
If A > 0, then pass the test if B > 0 or C > 0 or both. If B = C = 0, then fail the test.

What is the cleanest way to write this into the query WHERE statement?

Neat Pete
10-16-07, 09:31 PM
I didn't mean to be too mysterious, just leave some of the thinking to you. I always get confused with AND and OR too, especially when negatives (NOT) get involved.

Simply, just force the order, as things inside brackets get done first. WHERE A AND (B OR C). This alters the precedence, as the normal (implied) processing will be (A AND B) OR C. Your if-then-else statement has these brackets already. A, B and C are all logical things containing True or False, as you already know.

Sparhawk
10-16-07, 09:33 PM
I didn't mean to be too mysterious, just leave some of the thinking to you. I always get confused with AND and OR too, especially when negatives (NOT) get involved.

Simply, just force the order, as things inside brackets get done first. WHERE A AND (B OR C). This alters the precedence, as the normal (implied) processing will be (A AND B) OR C. Your if-then-else statement has these brackets already. A, B and C are all logical things containing True or False, as you already know.

I am (obviously) very new to MySQL, so I didn't know you could use parenthetical logic in a query.
Thanks for the tip.